GeneralMills / pytrends

Pseudo API for Google Trends
Other
3.16k stars 805 forks source link

What is the API doing? #534

Open jorgecabrejas7 opened 1 year ago

jorgecabrejas7 commented 1 year ago

I have been working with pytrends a long while now and realised that the results I get on the browser and using pytrends differ quite a bit. After checking the request each are doing, the only difference I was able to spot was the parameter user type inside the request they both made, and some minor changes, browser makes the request indicating timezone twice, for example.

Timeframes, timezones, and the rest of parameters in the requests are the same but the token, which you have to get before doing the actual request for the data. I don't know why this may be happening, and I don't feel the token has anything to do, as you request it, again, whit the only difference in both request of specifying different user types

Now I post the two full request each make but the token:

Does anyone know why is this happening and how can I manage to retrieve consistent data between both of them? @emlazzarin mentioning you as you closed last issue related to this, maybe have some info about it

lysergicHub commented 1 year ago

After a few days of reverse engineering, I finally found a workaround. The trick is to send the first request (the one used to get the token) to embed/explore/TIMESERIES instead of /api/explore

You will have to modify the parsing code in addition to changing the url because the response from embed/explore/TIMESERIES is a little bit different than the response from /api/explore

alicanyuksel commented 1 year ago

@lysergicHub there is also USER_TYPE_SCRAPER in the response, I'm not sure that the solution you propose will work well... I am wrong ?

Aassifh commented 1 year ago

Any new solution found ?

cmabastar commented 1 year ago

@lysergicHub , that doesn't seem to work well i think. i'm still getting the same result and the USER_TYPE_SCRAPER is still in place after trying out parsing embed/explore/TIMESERIES.

here's a snippet of the extraction/parsing

# parse the json
match = re.search(r"JSON.parse\('([^']+)'", response.text)
if not match.group(1):
    raise ResponseError(
        "Unable to parse embed widget JSON.parse", response=response
    )

# Convert the parsed json to python dict
widgets = json.loads(
    match.group(1).encode("utf8").decode("unicode_escape")
)
print(widgets)

Basicallly, the widget now will have the token and can be extracted and passed into the multiline api which is the same request being done in /api/explore

lysergicHub commented 1 year ago

Yes, you're right. But when I posted my comment, the userType was empty and it seemed to work. Sorry for misleading you, google has probably changed something in the meantime