Smile4ever / Neat-URL

Neat URL cleans URLs, removing parameters such as Google Analytics' utm parameters.
Other
614 stars 38 forks source link

Yet another Google parameter... #25

Closed RepoMoveBot closed 6 years ago

RepoMoveBot commented 6 years ago

Issue by nicolaasjan Sunday Aug 20, 2017 at 09:03 GMT


Today I noticed that Google uses a new(?) parameter: gs_l Example: https://www.google.nl/search?source=hp&q=Mozilla+Archive+Format&oq=Mozilla+Archive+Format&gs_l=psy-ab.3..0l2j0i22i30k1l2.1148.1148.0.4681.1.1.0.0.0.0.83.83.1.1.0.foo%2Cnso-ehuqi%3D1%2Cnso-ehuui%3D1%2Cewh%3D0%2Cnso-mplt%3D2%2Cnso-enksa%3D0%2Cnso-enfk%3D1%2Cnso-usnt%3D1%2Cnso-qnt-npqp%3D0-1701%2Cnso-qnt-npdq%3D0-54%2Cnso-qnt-npt%3D0-1%2Cnso-qnt-ndc%3D300%2Ccspa-dspm-nm-mnp%3D0-05%2Ccspa-dspm-nm-mxp%3D0-125%2Cnso-unt-npqp%3D0-17%2Cnso-unt-npdq%3D0-54%2Cnso-unt-npt%3D0-0602%2Cnso-unt-ndc%3D300%2Ccspa-uipm-nm-mnp%3D0-007525%2Ccspa-uipm-nm-mxp%3D0-052675...0...1..64.psy-ab..0.1.83.jV4WLrHrkAI Perhaps you can add it to your default filter?

Thanks for your add-on! Pure URL does not work anymore...

RepoMoveBot commented 6 years ago

Comment by Smile4ever Sunday Aug 20, 2017 at 15:07 GMT


Hi,

The gs_l parameter is not very new (2016), but I never saw it before :) https://productforums.google.com/forum/#!topic/webmasters/UJMhujdXgbE

I've added gs_l to Neat URL 1.2.0. You will automatically get this new parameter if you've not already added it manually.

The commit that implements Neat URL 1.2.0 can be found here: https://github.com/Smile4ever/firefoxaddons/commit/9a665747e2fc25d35a9c666af58918d424890449

Neat URL 1.2.0 will soon be available on addons.mozilla.org.

Thanks for your add-on!

You're welcome.

RepoMoveBot commented 6 years ago

Comment by EC-O-DE Sunday Aug 20, 2017 at 18:43 GMT


Yeah I just noticed I dunno if it's this or what but Google Images doesn't give direct link to images anymore... :(

RepoMoveBot commented 6 years ago

Comment by Smile4ever Sunday Aug 20, 2017 at 19:09 GMT


Try to disable Neat URL and try again. If it works after disabling Neat URL I will retract gs_l from the parameters list. If it doesn't work, gs_l can stay.

By the way, which version of Neat URL are you using, @ZenFi? Does it even have the gs_l parameter?

RepoMoveBot commented 6 years ago

Comment by nicolaasjan Monday Aug 21, 2017 at 07:06 GMT


Google Images works OK here. Clicking on a image result gives you the black frame where you can click on "view image" and that works well here (I manually added the gs_l parameter).

RepoMoveBot commented 6 years ago

Comment by nicolaasjan Monday Aug 21, 2017 at 07:36 GMT


By the way, when using Google images and search for lets say "mozilla" I get: https://www.google.nl/search?as_st=y&tbm=isch&hl=nl&as_q=mozilla&as_epq=&as_oq=&as_eq=&imgsz=&imgar=&imgc=&imgcolor=&imgtype=&cr=&as_sitesearch=&safe=images&as_filetype=&as_rights= But when I then specify within the results the size "large" I get this url: https://www.google.nl/search?q=mozilla&as_st=y&hl=nl&tbm=isch&source=lnt&tbs=isz:l&sa=X&ved=0ahUKEwiAhrj54-fVAhVFCcAKHWZOAFYQpwUIHQ&biw=1444&bih=905&dpr=1 Notice the ved parameter! See also: https://moz.com/blog/inside-googles-ved-parameter (more Google tracking...) Adding ved to the filter list does not give me any nasty side effects. So maybe add this to the default as well?

RepoMoveBot commented 6 years ago

Comment by eXqusic Wednesday Aug 23, 2017 at 10:12 GMT


What about all of amazons parameters?

https://www.amazon.com/Spigen-RA200-Earhooks-Earphones-Headphones/dp/B01NAM69IJ/ref=pd_sim_107_3?_encoding=UTF8&pd_rd_r=T9EY6TZZ0KF4V86SSGGD&pd_rd_w=3FbbK&pd_rd_wg=qDgGV&psc=1

Everything past "/ref=" isnt needed

So.. just to list, pdsim pdrd psc

thats just from that one link, theres more haha

RepoMoveBot commented 6 years ago

Comment by Smile4ever Wednesday Aug 23, 2017 at 11:18 GMT


I have the intent to implement parameters without full domain, like "ved@google.*". This will allow for specific parameters on multiple domains (but not all).

I will implement these parameters:

ref=pdsim* is harder to implement, but I might find a way to do it.

RepoMoveBot commented 6 years ago

Comment by eXqusic Wednesday Aug 23, 2017 at 20:44 GMT


pdrd* would be better, there is more then just those few you mentioned.

RepoMoveBot commented 6 years ago

Comment by Geobert Friday Aug 25, 2017 at 12:20 GMT


Not a Google parameter but a tracking parameter none the less:  http://www.futura-sciences.com/planete/actualites/paleontologie-vie-dodo-retrouvee-os-68360/#xtor=RSS-8

xtor=RSS-8 should be removed (tried to add xtor, #xtor and event /#xtor in the options with no luck)

Thanks for this extension!

RepoMoveBot commented 6 years ago

Comment by nicolaasjan Sunday Aug 27, 2017 at 17:14 GMT


Two other Google parameters: ei and sei Found as follows: In Google advanced search, search for lets say "Remove garbage from URLs". I get: https://www.google.com/search?lr=&hl=nl&as_qdr=all&q=%22Remove+garbage+from+URLs%22&oq=%22Remove+garbage+from+URLs%22 Then click on the coloured "Go to the Google homepage" link in the upper left corner. There I get: https://www.google.nl/webhp?hl=nl&sa=X&gws_rd=cr&ei=nvSiWfaROYiNUbyTlpAG The part ei=nvSiWfaROYiNUbyTlpAG contains a Unix timestamp and is often used in digital forensics... See: https://cheeky4n6monkey.blogspot.nl/2014/10/google-eid.html

While it doesn't seem to occur for every search, when it does, that "ei" parameter contains an encoded Unix UTC timestamp (and other things Google only knows). Interpreting this artifact can thus allow forensic analysts to date a particular search session.

When running his Python script in my Linux terminal I get:

python google-ei-time.py -u "https://www.google.nl/webhp?hl=nl&sa=X&gws_rd=cr&ei=nvSiWfaROYiNUbyTlpAG"
Running google-ei-time.py v2014-10-10

URL's ei term = nvSiWfaROYiNUbyTlpAG
Padded base64 string = nvSiWfaROYiNUbyTlpAG
Extracted timestamp = 1503851678
Human readable timestamp (UTC) = 2017-08-27T16:34:38

See also: http://kb.digital-detective.net/display/NetAnalysisV2/URL+Analysis#URLAnalysis-GoogleEI/SEIParameterDecoding

I can't remember where I saw the sei parameter, but it appears to be something similar.

First I only added ei and sei to the add-on settings, but for some reason unknown to me, YouTube got broken (video's did not play). :anguished: As I only encountered the issue at google.nl, I had to add:

RepoMoveBot commented 6 years ago

Comment by Smile4ever Sunday Aug 27, 2017 at 20:41 GMT


I worked on this. This is a status update to keep you all informed.

Done:

(amazon.* is a wildcard for amazon.de / amazon.com / amazon.fr ...)

Still TODO:

Please note that the above parameters won't work in Neat URL 1.2.0. An update will be provided shortly with support for wildcard domains. These new parameters will be added by default when upgrading users to the updated version.

RepoMoveBot commented 6 years ago

Comment by Smile4ever Wednesday Aug 30, 2017 at 18:33 GMT


I have implemented everything from above, except wildcard support for parameters. I added that to the TODO list.

It will be available in Neat URL 2.0.0: https://github.com/Smile4ever/firefoxaddons/commit/ff5fd890ed790a0e8d05081f2b1cbeef8f8358ce

(please ignore 1.5.0 in the CHANGELOG, it became 2.0.0)

Neat URL 2.0.0 has been submitted to addons.mozilla.org for approval. It will soon be available to end users. I will inform you when that happens.

RepoMoveBot commented 6 years ago

Comment by Geobert Wednesday Aug 30, 2017 at 20:51 GMT


Lean URL? or Neat URL?

Thanks for your work!

RepoMoveBot commented 6 years ago

Comment by Smile4ever Thursday Aug 31, 2017 at 04:50 GMT


I was sleepy. Neat URL of course.

RepoMoveBot commented 6 years ago

Comment by GitCurious Thursday Aug 31, 2017 at 08:42 GMT


Hello - thanks for the addon, I`m testing it now

**Everything after /ref on amazon.** ($/ref@amazon.)

This actually breaks certain links on Amazon, for example;

"Track Package" and "Cancelled Items"

there may be more but I have just noticed those two immediately.

An example link segment: amazon.co.uk/gp/your-account/ship-track/ref=xxx?ie=UTF8&itemId=xxx&orderId=xxx&shipmentId=xxx

the 'ref' parameter is stripping away customer specific item information after it.

RepoMoveBot commented 6 years ago

Comment by Smile4ever Thursday Aug 31, 2017 at 17:35 GMT


@GitCurious This bug is fixed in Neat URL 2.0.1. It will soon be available on addons.mozilla.org.

Everyone: By the way, go grab Neat URL 2.0.0 (or up) on addons.mozilla.org! 😃