PMassicotte / gtrendsR

R functions to perform and display Google Trends queries
352 stars 112 forks source link

Logic of input of keywords #453

Closed lukschue closed 9 months ago

lukschue commented 9 months ago

Hello everyone,

I have a general question on the input of keywords. How exactly is the logic of the hits that gtrendsR pulls from google if I do the following Request:

gtrends <- gtrends( keyword = c("NFL Superbowl"), geo = "US", time = "now 1-d", gprop = "web", category = 0, hl = "en", compared_breakdown = FALSE, low_search_volume = FALSE, tz = 0, onlyInterest = TRUE )

Do the hits represent google searches for NFL and Superbowl without minding the sequence of the keywords or do the hits represent the strict sequence as if I’d google ‘NFL Superbowl’? Lastly, if the logic is like ‘NFL Superbowl’, is it possible to instead pull all hits that include the keywords, without minding the sequence of the keywords?

Kind regards

eddelbuettel commented 9 months ago

It is similar to how many other 'search' interfaces operate. A list of tokens separated by whitespace as "NFL Superbowl" will look for either, both, and in any order. If you wrap apostrophes around as in "'NFL Superrbowl'" you denote that you want the sub-string as written. This behaves the same here as it does at the Google Trends website.

lukschue commented 9 months ago

Thank you very much :)

lukschue commented 9 months ago

Hi again.

I tested the logic of the hits that gtrendsR pulls from google as described above using:

trends1 <- gtrends( keyword = c("NFL Superbowl"),
geo = "US",
time = "2023-01-01 2023-04-01",
gprop = "web",
category = 0,
hl = "en",
compared_breakdown = FALSE, low_search_volume = FALSE, tz = 0, onlyInterest = TRUE )

and

trends2 <- gtrends( keyword = c("'NFL Superbowl'"),
geo = "US",
time = "2023-01-01 2023-04-01",
gprop = "web",
category = 0,
hl = "en",
compared_breakdown = FALSE, low_search_volume = FALSE, tz = 0, onlyInterest = TRUE )

I also compared this data to data I manually retrieved from the website for the same time period and location (in all four cases, it is non-real-time data) to see if the data collected with gtrendsR matched the data you would find manually by simply using the keyword or putting the keyword in quotes.

MicrosoftTeams-image

As you can see, the data from gtrendsR is identical whether the keyword is specified with or without quotes. For the data I downloaded by hand from the website, the data is different. Also, the gtrendsR data appears to be a GT search for a keyword without quotes. Did I do something wrong in the code, or does anyone know how to use gtrendsR to accurately capture terms with quotes?

Kind regards.

PMassicotte commented 9 months ago
library(gtrendsR)

trends1 <- gtrends(
  keyword = c("NFL Superbowl"),
  geo = "US",
  time = "2023-01-01 2023-04-01",
  gprop = "web",
  category = 0,
  hl = "en",
  compared_breakdown = FALSE,
  low_search_volume = FALSE,
  tz = 0,
  onlyInterest = TRUE
)

trends2 <- gtrends(
  keyword = c('"NFL Superbowl"'),
  geo = "US",
  time = "2023-01-01 2023-04-01",
  gprop = "web",
  category = 0,
  hl = "en",
  compared_breakdown = FALSE,
  low_search_volume = FALSE,
  tz = 0,
  onlyInterest = TRUE
)

plot(trends1)

plot(trends2)

Created on 2023-10-04 with reprex v2.0.2

They are not the same.