Rhizome-Conifer / conifer

Collect and revisit web pages.
https://conifer.rhizome.org
Apache License 2.0
1.47k stars 117 forks source link

Ability to replay google searches #553

Open michaelconnor opened 6 years ago

michaelconnor commented 6 years ago

When making a google search, Google adds random gibberish to the end of the URL. So when you webrecord a google search, you won't be able to replay the search - hitting the button will generate a new string of gibberish, and it will show as page not found. is there a way that webrecorder could ignore the random parts of the url, and give you the recorded results page when you reperform the search in replay mode? This has come up for me more than 1x -

recent example

Google search page: https://webrecorder.io/michaeleatsfood/google-searches/20180612145914/https://www.google.com/?gws_rd=ssl

Trying to reproduce my search ("hello") yields: The url https://www.google.com/search?source=hp&ei=wt8fW6DlLsyTzwKtt4y4BQ&q=hello&oq=hello&gs_l=psy-ab.3..0l4j0i131k1j0j0i131k1j0l3.886.1371.0.1677.5.5.0.0.0.0.67.166.4.4.0....0...1.1.64.psy-ab..1.4.166....0.VznbaPErFFU was not found in the archive.

archived results page: https://webrecorder.io/michaeleatsfood/google-searches/20180612145957/https://www.google.com/search?source=hp&ei=wt8fW6DlLsyTzwKtt4y4BQ&q=hello&oq=hello&gs_l=psy-ab.12...0.0.3.3064.0.0.0.0.0.0.0.0..0.0....0...1..64.psy-ab..0.0.0....0.AqahdaLaFYc

Basically everything after gs_l seems like gibberish.

here is an old example, I seem to recall that this once worked:

Google Search page: https://webrecorder.io/michaeleatsfood/juliaweistreach/list/bookmarks/b9/20150709131257/https://www.google.com/webhp?hl=en#hl=en

Trying to recreate my recorded search ("parbunkells") lands me on https://webrecorder.io/michaeleatsfood/juliaweistreach/list/bookmarks/b9/20150709131257/https://www.google.com/webhp?hl=en#hl=en&q=parbunkells

https://webrecorder.io/michaeleatsfood/juliaweistreach/20150617150801/https://www.google.com/search?q=parbunkells

michaelconnor commented 6 years ago

This would be specifically helpful for archiving Therevolvinginternet.com - the proposal is to make a webrecorder recording of various google searches, and then embed the webrecorder recording within the rotating iframe :-/

michaelconnor commented 6 years ago

Relevant parameters: ?q=boop (I searched boop)

Optional params If people use search tools: &tbs=qdr:h (past hour) &tbs=qdr:d (past day) &tbs=qdr:w (past week) &tbs=qdr:m (past month) &tbs=qdr:y (past year) &tbs=cdr%3A1%2Ccd_min%3AYYYY%2Ccd_max%3AYYYY (from YYYY to YYYY) &tbas=0 is default, but this param seems unecessary.

&tbm=isch IMAGES &tbm=vid VIDEO &tbm=shop Guess

I couldnt figure out a parameter for language - it seems to have to do with cookies.

michaelconnor commented 6 years ago

Other params: PARAMS THAT SHOW WHEN LOGGEDOUT &oq=boop (ORIGINAL QUERY?) &aqs=chrome.0.69i59j69i61l2j69i60.1228j0j1 &sourceid=chrome &ie=UTF-8

PARAMS THAT SHOW FOR A LOGGED IN USER ONLY &tbas= - probably a token thing - ignore &source= surveillance gibberish &sa - gibberish &ved - seems like gibberish &biw - only when logged in, for me it's always 1098 &bih=639 &dpr=2.5

michaelconnor commented 6 years ago

so in summary, definitely don't ignore &tbm, and &tbas is relevant if not 0

michaelconnor commented 6 years ago

@ikreymer im sorry to bother - i need to know if this is likely to/possible to happen soon (like in the next few days?), otherwise i'll plan to proceed without it!