PMassicotte / gtrendsR

R functions to perform and display Google Trends queries
352 stars 112 forks source link

R request results differ from Online request #454

Open Brummerling opened 8 months ago

Brummerling commented 8 months ago

My R request does not equal the GTrends Online request with this command: mytest <- gtrends(keyword = "KiK", geo = "DE", time = "2023-06-01 2023-07-31", onlyInterest = TRUE) There are a lot of zeros in the result. Also my today's R-request does not equal yesterday's request with same command.

What is going on here?

kik_R_jun_jul.xlsx

DataSocialist commented 8 months ago

If the timeframe you are looking pulling data for is older than 30 days, Google does not provide real time data, but a sample. See here: https://support.google.com/trends/answer/4365533?hl=en

If you search for low frequency terms, the sample variance can be pretty significant. You should therfore pull multiple samples for the same keywords and timeframe until the sample variance is reduced to an acceptable level.

ehulland commented 8 months ago

Similar to what @Brummerling is seeing, I am getting a large number of 0 results for my gtrends queries as well. I used the search term "Google" as a test example, and found a period of all 0s from the start of 2020 to early-2021.

This was my gtrends call:

gt<-gtrends('Google',time='2020-01-01 2023-08-31', geo='US', category=0)
gt_trend<-gt$interest_over_time

and this was the plot I created from my gtrends interest over time (red) versus downloaded google search trends from the website for the exact same term, geo, and time period (blue).

image

I understand that the gtrends results are a sample of data and so the data may vary, but from my understanding, it shouldn't be pure 0s for a year +, especially not for a popular search term like "Google" itself.

I tried running this many times over and got the same result between yesterday and today.

PMassicotte commented 8 months ago

I will try to have a look this week.

ehulland commented 8 months ago

Building on my earlier discrepancy, I am finding that some US subnational locations are being returned with no data. I have been using the keyword "flu" and was searching for Texas alone using the following code (as a working example) which returned no data.

gt<-gtrends('flu',time='2020-01-01 2023-08-31', geo='US-TX', category=0)
gt_trend<-gt$interest_over_time

On the Google Trends website, we do see data for Texas for that same time period with the search term "Flu".

Screenshot 2023-10-19 at 12 15 36 PM

To confirm that I wasn't using US geos wrong, I ran "Google" for Texas and it did return data, though again there was a similar discrepancy to the national level search for "Google' (gtrends results in Red, download from Google Search Trends in Blue): image

Surprisingly, the national-level for "flu" worked as expected and matched Google Trends, suggesting that these issues are intermittent and inconsistent.

image

eddelbuettel commented 8 months ago

suggesting that these issues are intermittent and inconsistent.

The worst part, really, is that we have no public API to access here and hence really no real leg to stand on to complain to Google. The result data is ... what they give us and that is that. (Modulo possible errors in the REST request string but that is mostly ironed out by now.)

PMassicotte commented 8 months ago

How did you get the red curve in your last plot? (I,e, what parameters have you used within gtrendsR).

ehulland commented 8 months ago

I used gtrends("flu", geo='US', time= ('2020-01-01 2023-08-31'))

PMassicotte commented 8 months ago

Not sure to understand,

This: gtrends("flu", geo='US', time= ('2020-01-01 2023-08-31'))

matched the data you searched for Texas directly on the webpage?

ehulland commented 8 months ago

Nope - the last plot was of "Flu" in the US specifically.

My search query for Texas (which yielded no results) was gt<-gtrends('flu',time='2020-01-01 2023-08-31', geo='US-TX', category=0)

Brummerling commented 8 months ago

We'll get in contact with Google. Hopefully, they have some information I can share here.

skart98 commented 8 months ago

Same here, lots of zeros that are not in the .csv from the Google Trends website. Also pytrends (which is the Python equivalent) gives me similar data. So I think this is a problem on the Google side.