Open michaelsframe opened 8 years ago
@michaelsframe have you observed this problem? when i use a search term in google for "stuff" my URL is: https://www.google.com/search?q=stuff&oq=stuff&aqs=chrome..69i57j69i65j0l4.958j0j4&sourceid=chrome&es_sm=119&ie=UTF-8
and the forensic page has had no problem pulling out q=stuff.
Yes, the article is from 2013 but Google has locked down even more since then.
I observed it directly last night. It works for bing and yahoo, but not google. Feel free to try it out yourself. If you look at the url's that get captured when you hit google pages you'll see that the url is truncated (it does not include the query terms).
Hmm I'm seeing search terms coming back for Google, Bing, and Yahoo searches. Here's an example of one of the Google URLs in the Visited Links section of Forensic:
https://www.google.com/?gws_rd=ssl#q=monkeys
It's pulling the term immediately following the q=, so I think regex is working perhaps. The only improvement would be to strip the + signs out of the search term expression before we persist the values, for example:
https://www.google.com/search?q=yahoo+search&ie=utf-8&oe=utf-8
The search term comes back as 'yahoo+search'
Does this work with multiple searches within google? That is, after the first search if you do different searches do each keep logging the google url including search terms?
@michaelsframe This "works" if you refresh the Google page after doing a search. Simply doing the search in Google, doesn't trigger a real page refresh so the panel and extraction doesn't happen again unless you refresh. When you do that, the search term used is persisted.
Since google began encrypting searches you can only get the google URL, not the query parameters. (http://blog.hubspot.com/marketing/google-encrypting-all-searches-nj)
This means that if we want to capture search terms in domain discovery we will have to get them in a different way.
Perhaps we should include a "Search" box in the toolbar where the users would type the words and then we would return search results in a custom page. This could be where we finally start controlling returned search results to include non-google results to the user to begin a trail.