facebook / facebook-python-business-sdk

Python SDK for Meta Marketing APIs
https://developers.facebook.com/docs/business-sdk
Other
1.27k stars 629 forks source link

FacebookBadObjectError("Bad data to set object data") is raised. #641

Closed ksh24865 closed 1 year ago

ksh24865 commented 1 year ago

It is normally performed until the ad_insights data under the ad account is called by adding the async option.

An error occurs when attempting "loaded_insights = [dict(insight) for insight in insights]" to save data from all pages of "async_job.get_result()" as a Python object.

The error information is as follows.

Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "<string>", line 2, in <module>
File ".../venv/lib/python3.8/site-packages/facebook_business/api.py", line 789, in next
if not self._queue and not self.load_next_page():
File ".../venv/lib/python3.8/site-packages/facebook_business/api.py", line 866, in load_next_page
self._queue = self.build_objects_from_response(response)
File ".../venv/lib/python3.8/site-packages/facebook_business/api.py", line 875, in build_objects_from_response
return self._object_parser.parse_multiple(response)
File ".../venv/lib/python3.8/site-packages/facebook_business/adobjects/objectparser.py", line 95, in parse_multiple
ret = [AbstractObject.create_object(self._api, data,
File ".../venv/lib/python3.8/site-packages/facebook_business/adobjects/abstractobject.py", line 173, in create_object
new_object._set_data(data)
File ".../venv/lib/python3.8/site-packages/facebook_business/adobjects/abstractobject.py", line 99, in _set_data
raise FacebookBadObjectError("Bad data to set object data")
facebook_business.exceptions.FacebookBadObjectError: Bad data to set object data

The sample code is as follows

def sample_func():

    from facebook_business.api import FacebookAdsApi
    from facebook_business.adobjects.adaccount import AdAccount
    from facebook_business.adobjects.adsinsights import AdsInsights
    from facebook_business.adobjects.user import User
    from facebook_business.adobjects.adreportrun import AdReportRun
    from typing import List, Iterable, Dict, Any
    from datetime import date

    def get_insights(
        ad_account_id: str,
        fields: List[str],
        params: Dict[str, Any],
    ) -> List[AdsInsights]:
        import time
        import requests

        async_job = AdAccount(ad_account_id).get_insights(
            fields=fields,
            params=params,
            is_async=True,
        )
        async_status = "Job Not Started"
        async_percent_completion = 0
        while async_status != "Job Completed" or async_percent_completion < 100:
            async_job.api_get()
            async_status = async_job[AdReportRun.Field.async_status]
            async_percent_completion = async_job[
                AdReportRun.Field.async_percent_completion
            ]

            if async_status == "Job Failed":
                raise requests.RequestException("GET Insights Async Job Failed")
            time.sleep(3)
        async_job.api_get()
        time.sleep(1)
        ads_insights = async_job.get_result()
        return ads_insights

    def date_generator(
        start_date: "date",
        end_date: "date",
    ) -> Iterable["date"]:
        from datetime import timedelta

        for n in range(int((end_date - start_date).days) + 1):
            yield start_date + timedelta(days=n)

    def get_date_ranges(start_date,end_date):
        return [
            {
                "since": str(date_),
                "until": str(date_),
            }
            for date_ in date_generator(
                start_date=start_date,
                end_date=end_date,
            )
        ]

    def get_params(
        start_date: "date",
        end_date: "date",
        breakdowns: List[str],
    ) -> Dict[str, Any]:
        return {
            "time_ranges": get_date_ranges(
                start_date=start_date,
                end_date=end_date,
            ),
            "level": "ad",
            "filtering": [{"field": "spend", "operator": "GREATER_THAN", "value": 0}],
            "use_unified_attribution_setting": True,
            "breakdowns": breakdowns,
        }

    token = 'dummy_token'

    FacebookAdsApi.init(access_token=token, api_version='v17.0')
    context = User(fbid="me")
    print(context.api_get(fields=[User.Field.email, User.Field.name, User.Field.currency]))
    ad_account_id = 'act_000000000000'
    ad_account = AdAccount(ad_account_id)
    print(ad_account)
    default_fields = [
        AdsInsights.Field.campaign_name,
        AdsInsights.Field.campaign_id,
        AdsInsights.Field.adset_id,
        AdsInsights.Field.adset_name,
        AdsInsights.Field.ad_id,
        AdsInsights.Field.ad_name,
        AdsInsights.Field.impressions,
        AdsInsights.Field.reach,
        AdsInsights.Field.clicks,
        AdsInsights.Field.inline_link_clicks,
        AdsInsights.Field.spend,
        AdsInsights.Field.attribution_setting,
        AdsInsights.Field.actions,
        AdsInsights.Field.action_values,
    ]

    default_break_downs = [
        AdsInsights.Breakdowns.hourly_stats_aggregated_by_advertiser_time_zone
    ]

    insights = get_insights(
        ad_account_id=ad_account_id,
        fields=default_fields,
        params=get_params(
            start_date=date(2023,5,26),
            end_date=date(2023,6,1),
            breakdowns=default_break_downs,
        )
    )
    loaded_insights = [dict(insight) for insight in insights]
    print(f"loaded_insights: {loaded_insights}")

sample_func()
sfontana commented 1 year ago

I'm experiencing the same and I opened an issue yesterday here.

The problem is that this Python library is sending the following request at some point

params = {
        'method': 'GET',
        'url': 'https://graph.facebook.com/v17.0/123456/insights?access_token=....&limit=25&after=MjQZD',
        'files': {},
        'data': {},
        'json': None,
        'headers': {
            'User-Agent': 'fbbizsdk-python-v17.0.1',
            'Accept-Encoding': 'gzip, deflate',
            'Accept': '*/*',
            'Connection': 'keep-alive',
        },
        'params': {
            # This is where the access token gets appended again
            'access_token': ...,
            'appsecret_proof': .....,
        },
        'auth': None,
        'cookies': RequestsCookieJar(),
        'hooks': {'response': []},
    }
    p = PreparedRequest()
    p.prepare(**params)

    send_kwargs = {
        "timeout": None,
        "allow_redirects": True,
    }
    send_kwargs.update(
        {
            'proxies': {},
            'stream': False,
            'cert': None,
        }
    )
    session = Session()
    resp = session.send(p, **send_kwargs)
    # Each page adds one access token parameter to the URL,
    # after 300 requests there are 300 `access_token` parameters in the URL and the server will answer with 400 or 502 with an error HTML page which trigger the bad object data because it's not valid JSON
    assert resp.json()['paging']['next'].count('access_token') == 1

so what gets sent when requesting the second page of the report is

https://graph.facebook.com/v17.0/123456/insights?access_token=XXX&limit=25&after=MjQZD plus access_token=XXX as a parameter which results in: https://graph.facebook.com/v17.0/123456/insights?access_token=XXX&access_token=XXX&limit=25&after=MjQZD (2 access_token parameters)

the request of the third page will be: https://graph.facebook.com/v17.0/123456/insights?access_token=XXX&access_token=XXX&access_token=XXX&limit=25&after=MjQZD (3 access_token parameters) and so on, until the URL is so long that facebook servers return HTTP error 400 (or 502) with a generic error HTML page which is the invalid JSON that triggers the Bad data to set object data exception.

I believe that Facebook's backend used to drop duplicate parameters, but recently, it no longer does so.

sfontana commented 1 year ago

@stcheng I'm tagging you directly as this issue prevents fetching data in some cases and there is no easy workaround.

sfontana commented 1 year ago

@stcheng I managed to narrow this down even more.

this library is sending duplicate access_token parameters to the API. I think this line is the cause, the duplication of theaccess_token parameter takes place here.

For instance the second page sends two access_token parameters, the 3rd sends 3 and so on, you can easily reproduce this behaviour

$ curl --location 'https://graph.facebook.com/v17.0/123456/insights?access_token=MY_TOKEN&access_token=MY_TOKEN&limit=25&after=Mzk5'

{"data":[

.....
], 
"paging":{"cursors":{"before":"NDAw","after":"NDI0"},
"next":"https:\/\/graph.facebook.com\/v17.0\/123456\/insights?access_token=MY_TOKEN&access_token=MY_TOKEN&limit=25&after=NDI0",
"previous":"https:\/\/graph.facebook.com\/v17.0\/123456\/insights?access_token=MY_TOKEN&access_token=MY_TOKEN&limit=25&before=NDAw"}}

As I stated in my previous comment, I believe that the backend used to strip duplicate parameters when creating the response but that behaviour changed a few days ago and duplicate parameters are no longer removed. If a report is big enough, after a few hundred pages there will be a few hundreds access_token parameters in the URL and requests will fail with 502 or 400.

Hope this helps.

didopimentel commented 1 year ago

I had the same issue, and adding up to what @sfontana said, I narrowed it down even more to help you out. The issue is actually in the prepare function of the requests builtin package. There is a line that prepares the params and it automatically concatenates the access_token. I think we should only be calling the prepare function for the first call, and not for the iteration over the cursor.

In my case, there is also the summary: true param that is being duplicated as well. It crashes exactly after fetching 9600 records, which giving each page contains 25 records, it means I can fetch exactly 384 pages. The 385th page crashes. It seems like it is an issue with the url length, that ends up having over 160k characters and 160kb size.

stcheng commented 1 year ago

We greatly appreciate the community for bringing this issue to our attention. We are aware of this incorrect behavior and are actively working on a solution to resolve it. Specifically, the problematic line causing the issue is identified. The cursor returns the paging with complete URLs that include both the "access_token" and "summary" fields.

Here is a sample response with paging.

{
   "data": [
   ...
   ],
   "paging": {
      "cursors": {
         "before": "<cursor_before>",
         "after": "<cursor_after>"
      },
      "previous": "https://graph.facebook.com/v17.0/<endpoint>?access_token=<access_token>&summary=true&limit=25&before=<cursor_before>",
      "next": "https://graph.facebook.com/v17.0/<endpoint>?access_token=<access_token>&summary=true&limit=25&after=<cursor_after>"
   }
}

To address this, here is a patch utilizing the provided "before" and "after" cursors as parameters

index 85fc04b..7ab9a97 100644
--- a/facebook_business/api.py
+++ b/facebook_business/api.py
@@ -846,9 +846,12 @@ class Cursor(object):
         response = response_obj.json()
         self._headers = response_obj.headers()

-        if 'paging' in response and 'next' in response['paging']:
-            self._path = response['paging']['next']
-            self.params = {}
+        if (
+            'paging' in response and
+            'cursors' in response['paging'] and
+            'after' in response['paging']['cursors']
+        ):
+            self.params['after'] = response['paging']['cursors']['after']
         else:
             # Indicate if this was the last page
             self._finished_iteration = True

It appears that there is a newly introduced restriction on the total size of the request. This restriction may be a result of receiving an overwhelming number of oversized requests due to this long-standing hidden bug. We will investigate whether this restriction is temporary or if it can be lifted once the bug is resolved. Besides, we will assess whether it is safe to include the access_token in the response.

sfontana commented 1 year ago

The fix works, thanks for releasing it already :)

ksh24865 commented 1 year ago

I checked that the problem was fixed in this commit. Thank you so much to all of you who have helped me a lot !! :)

@sfontana @didopimentel @stcheng