Closed ksh24865 closed 1 year ago
I'm experiencing the same and I opened an issue yesterday here.
The problem is that this Python library is sending the following request at some point
params = {
'method': 'GET',
'url': 'https://graph.facebook.com/v17.0/123456/insights?access_token=....&limit=25&after=MjQZD',
'files': {},
'data': {},
'json': None,
'headers': {
'User-Agent': 'fbbizsdk-python-v17.0.1',
'Accept-Encoding': 'gzip, deflate',
'Accept': '*/*',
'Connection': 'keep-alive',
},
'params': {
# This is where the access token gets appended again
'access_token': ...,
'appsecret_proof': .....,
},
'auth': None,
'cookies': RequestsCookieJar(),
'hooks': {'response': []},
}
p = PreparedRequest()
p.prepare(**params)
send_kwargs = {
"timeout": None,
"allow_redirects": True,
}
send_kwargs.update(
{
'proxies': {},
'stream': False,
'cert': None,
}
)
session = Session()
resp = session.send(p, **send_kwargs)
# Each page adds one access token parameter to the URL,
# after 300 requests there are 300 `access_token` parameters in the URL and the server will answer with 400 or 502 with an error HTML page which trigger the bad object data because it's not valid JSON
assert resp.json()['paging']['next'].count('access_token') == 1
so what gets sent when requesting the second page of the report is
https://graph.facebook.com/v17.0/123456/insights?access_token=XXX&limit=25&after=MjQZD
plus
access_token=XXX
as a parameter which results in:
https://graph.facebook.com/v17.0/123456/insights?access_token=XXX&access_token=XXX&limit=25&after=MjQZD
(2 access_token
parameters)
the request of the third page will be: https://graph.facebook.com/v17.0/123456/insights?access_token=XXX&access_token=XXX&access_token=XXX&limit=25&after=MjQZD
(3 access_token
parameters)
and so on, until the URL is so long that facebook servers return HTTP error 400 (or 502) with a generic error HTML page which is the invalid JSON that triggers the Bad data to set object data
exception.
I believe that Facebook's backend used to drop duplicate parameters, but recently, it no longer does so.
@stcheng I'm tagging you directly as this issue prevents fetching data in some cases and there is no easy workaround.
@stcheng I managed to narrow this down even more.
this library is sending duplicate access_token
parameters to the API. I think this line is the cause, the duplication of theaccess_token
parameter takes place here.
For instance the second page sends two access_token
parameters, the 3rd sends 3 and so on, you can easily reproduce this behaviour
$ curl --location 'https://graph.facebook.com/v17.0/123456/insights?access_token=MY_TOKEN&access_token=MY_TOKEN&limit=25&after=Mzk5'
{"data":[
.....
],
"paging":{"cursors":{"before":"NDAw","after":"NDI0"},
"next":"https:\/\/graph.facebook.com\/v17.0\/123456\/insights?access_token=MY_TOKEN&access_token=MY_TOKEN&limit=25&after=NDI0",
"previous":"https:\/\/graph.facebook.com\/v17.0\/123456\/insights?access_token=MY_TOKEN&access_token=MY_TOKEN&limit=25&before=NDAw"}}
As I stated in my previous comment, I believe that the backend used to strip duplicate parameters when creating the response but that behaviour changed a few days ago and duplicate parameters are no longer removed.
If a report is big enough, after a few hundred pages there will be a few hundreds access_token
parameters in the URL and requests will fail with 502
or 400
.
Hope this helps.
I had the same issue, and adding up to what @sfontana said, I narrowed it down even more to help you out. The issue is actually in the prepare
function of the requests builtin package. There is a line that prepares the params and it automatically concatenates the access_token.
I think we should only be calling the prepare
function for the first call, and not for the iteration over the cursor.
In my case, there is also the summary: true
param that is being duplicated as well. It crashes exactly after fetching 9600 records, which giving each page contains 25 records, it means I can fetch exactly 384 pages. The 385th page crashes. It seems like it is an issue with the url length, that ends up having over 160k characters and 160kb size.
We greatly appreciate the community for bringing this issue to our attention. We are aware of this incorrect behavior and are actively working on a solution to resolve it. Specifically, the problematic line causing the issue is identified. The cursor returns the paging with complete URLs that include both the "access_token" and "summary" fields.
Here is a sample response with paging.
{
"data": [
...
],
"paging": {
"cursors": {
"before": "<cursor_before>",
"after": "<cursor_after>"
},
"previous": "https://graph.facebook.com/v17.0/<endpoint>?access_token=<access_token>&summary=true&limit=25&before=<cursor_before>",
"next": "https://graph.facebook.com/v17.0/<endpoint>?access_token=<access_token>&summary=true&limit=25&after=<cursor_after>"
}
}
To address this, here is a patch utilizing the provided "before" and "after" cursors as parameters
index 85fc04b..7ab9a97 100644
--- a/facebook_business/api.py
+++ b/facebook_business/api.py
@@ -846,9 +846,12 @@ class Cursor(object):
response = response_obj.json()
self._headers = response_obj.headers()
- if 'paging' in response and 'next' in response['paging']:
- self._path = response['paging']['next']
- self.params = {}
+ if (
+ 'paging' in response and
+ 'cursors' in response['paging'] and
+ 'after' in response['paging']['cursors']
+ ):
+ self.params['after'] = response['paging']['cursors']['after']
else:
# Indicate if this was the last page
self._finished_iteration = True
It appears that there is a newly introduced restriction on the total size of the request. This restriction may be a result of receiving an overwhelming number of oversized requests due to this long-standing hidden bug. We will investigate whether this restriction is temporary or if it can be lifted once the bug is resolved. Besides, we will assess whether it is safe to include the access_token in the response.
The fix works, thanks for releasing it already :)
I checked that the problem was fixed in this commit. Thank you so much to all of you who have helped me a lot !! :)
@sfontana @didopimentel @stcheng
It is normally performed until the ad_insights data under the ad account is called by adding the async option.
An error occurs when attempting "loaded_insights = [dict(insight) for insight in insights]" to save data from all pages of "async_job.get_result()" as a Python object.
The error information is as follows.
The sample code is as follows