Sotera / DatawakeDepot

Loopback web application for administration of Datawake networks
Apache License 2.0
9 stars 7 forks source link

Google Search Terms cannot be obtained #103

Open michaelsframe opened 8 years ago

michaelsframe commented 8 years ago

Since google began encrypting searches you can only get the google URL, not the query parameters. (http://blog.hubspot.com/marketing/google-encrypting-all-searches-nj)

This means that if we want to capture search terms in domain discovery we will have to get them in a different way.

Perhaps we should include a "Search" box in the toolbar where the users would type the words and then we would return search results in a custom page. This could be where we finally start controlling returned search results to include non-google results to the user to begin a trail.

bwhiteman commented 8 years ago

@michaelsframe have you observed this problem? when i use a search term in google for "stuff" my URL is: https://www.google.com/search?q=stuff&oq=stuff&aqs=chrome..69i57j69i65j0l4.958j0j4&sourceid=chrome&es_sm=119&ie=UTF-8

and the forensic page has had no problem pulling out q=stuff.

michaelsframe commented 8 years ago

Yes, the article is from 2013 but Google has locked down even more since then.

I observed it directly last night. It works for bing and yahoo, but not google. Feel free to try it out yourself. If you look at the url's that get captured when you hit google pages you'll see that the url is truncated (it does not include the query terms).

bmcdougald commented 8 years ago

Hmm I'm seeing search terms coming back for Google, Bing, and Yahoo searches. Here's an example of one of the Google URLs in the Visited Links section of Forensic:

https://www.google.com/?gws_rd=ssl#q=monkeys

It's pulling the term immediately following the q=, so I think regex is working perhaps. The only improvement would be to strip the + signs out of the search term expression before we persist the values, for example:

https://www.google.com/search?q=yahoo+search&ie=utf-8&oe=utf-8

The search term comes back as 'yahoo+search'

michaelsframe commented 8 years ago

Does this work with multiple searches within google? That is, after the first search if you do different searches do each keep logging the google url including search terms?

bmcdougald commented 8 years ago

@michaelsframe This "works" if you refresh the Google page after doing a search. Simply doing the search in Google, doesn't trigger a real page refresh so the panel and extraction doesn't happen again unless you refresh. When you do that, the search term used is persisted.