What is the API doing? - Githubissues

jorgecabrejas7 commented 1 year ago

I have been working with pytrends a long while now and realised that the results I get on the browser and using pytrends differ quite a bit. After checking the request each are doing, the only difference I was able to spot was the parameter user type inside the request they both made, and some minor changes, browser makes the request indicating timezone twice, for example.

Browser: "userConfig":{"userType":"USER_TYPE_LEGIT_USER"}
Pytrends: "userConfig": {"userType": "USER_TYPE_SCRAPER"}

Timeframes, timezones, and the rest of parameters in the requests are the same but the token, which you have to get before doing the actual request for the data. I don't know why this may be happening, and I don't feel the token has anything to do, as you request it, again, whit the only difference in both request of specifying different user types

Now I post the two full request each make but the token:

Pytrends https://trends.google.com/trends/api/widgetdata/multiline?req={"time": "2014-12-28 2020-01-01", "resolution": "WEEK", "locale": "es", "comparisonItem": [{"geo": {"region": "ES-CM"}, "complexKeywordsRestriction": {"keyword": [{"type": "BROAD", "value": "gripe"}]}}], "requestOptions": {"property": "", "backend": "IZG", "category": 0}, "userConfig": {"userType": "USER_TYPE_SCRAPER"}}&token=TOKEN_HERE&tz=-120
Browser https://trends.google.es/trends/api/widgetdata/multiline?hl=es&tz=-120&tz=-120&req={"time":"2014-12-28 2020-01-01","resolution":"WEEK","locale":"es","comparisonItem":[{"geo":{"region":"ES-CM"},"complexKeywordsRestriction":{"keyword":[{"type":"BROAD","value":"gripe"}]}}],"requestOptions":{"property":"","backend":"IZG","category":0},"userConfig":{"userType":"USER_TYPE_LEGIT_USER"}}&token=TOKEN_HERE

Does anyone know why is this happening and how can I manage to retrieve consistent data between both of them? @emlazzarin mentioning you as you closed last issue related to this, maybe have some info about it

lysergicHub commented 1 year ago

After a few days of reverse engineering, I finally found a workaround. The trick is to send the first request (the one used to get the token) to embed/explore/TIMESERIES instead of /api/explore

You will have to modify the parsing code in addition to changing the url because the response from embed/explore/TIMESERIES is a little bit different than the response from /api/explore

alicanyuksel commented 1 year ago

@lysergicHub there is also USER_TYPE_SCRAPER in the response, I'm not sure that the solution you propose will work well... I am wrong ?

Aassifh commented 1 year ago

Any new solution found ?

cmabastar commented 1 year ago

@lysergicHub , that doesn't seem to work well i think. i'm still getting the same result and the USER_TYPE_SCRAPER is still in place after trying out parsing embed/explore/TIMESERIES.

here's a snippet of the extraction/parsing

# parse the json
match = re.search(r"JSON.parse\('([^']+)'", response.text)
if not match.group(1):
    raise ResponseError(
        "Unable to parse embed widget JSON.parse", response=response
    )

# Convert the parsed json to python dict
widgets = json.loads(
    match.group(1).encode("utf8").decode("unicode_escape")
)
print(widgets)

Basicallly, the widget now will have the token and can be extracted and passed into the multiline api which is the same request being done in /api/explore

lysergicHub commented 1 year ago

Yes, you're right. But when I posted my comment, the userType was empty and it seemed to work. Sorry for misleading you, google has probably changed something in the meantime

GeneralMills / pytrends

What is the API doing? #534