gleitz / howdoi

instant coding answers via the command line
http://blog.gleitzman.com/post/43330157197/howdoi-instant-coding-answers-via-the-command-line
MIT License
10.56k stars 867 forks source link

Is the tracking parameter t=hj in the duckduckgo URL intentional? #465

Closed Alexhans closed 2 years ago

Alexhans commented 2 years ago

Hi,

I was looking at the different query parameters in the search urls when I found that duckduckgo's t is a tracking code (I don't know what hj as the value means: https://help.duckduckgo.com/privacy/t/

Through partnerships with developers and companies, DuckDuckGo has been integrated into many applications. In these partnerships, a portion of DuckDuckGo's advertising revenue is sometimes shared back. To assign advertising revenue and collect anonymous aggregate usage information, developers add a unique "&t=" parameter to searches made through their applications.

https://github.com/gleitz/howdoi/blob/f202eea521be7c280390a950fc881cce59336ceb/howdoi/howdoi.py#L72

gleitz commented 2 years ago

I'm not sure if it is necessary or not, but currently duckduckgo is not working as an engine due to https://github.com/gleitz/howdoi/issues/404

gleitz commented 2 years ago

If you'd like to figure out that issue, I could use some help with it!

Alexhans commented 2 years ago

I can try. I actually came across this while quickly playing around to see If I could add brave support.

Support works but I wanted to understand the decisions around URLs and usage. I did get temporarily blocked when I added the unit tests in brave which it's somewhat expected (I still remember the Google has been DDoSing SourceHut for over a year story).

The only thing I can think of is ask duckduckgo & brave to see if they have specific ways to interact programmatically with their websites.

I do think the answer will not be satisfactory for duckduckgo since, in their instant answers api page they state:

This API does not include all of our links, however. That is, it is not a full search results API or a way to get DuckDuckGo results into your applications beyond our instant answers. Because of the way we generate our search results, we unfortunately do not have the rights to fully syndicate our results, free or paid. For the same reason, we cannot allow framing our results without our branding. Please see our partnerships page for more info on guidelines and getting in touch with us.

So crawling ethically (Without trying to circumvent through proxies or similar) will invariably get blocked. For DDG, it might be a case of choosing whether to remove it entirely or just support instant answers through their API (For any API based access, users could get their own tokens like in OpenBBTerminal

gleitz commented 2 years ago

Yes I am not optimistic that API access will be given, so we're left with crawling.

I also get rate limited during development, which is why I have the caching mechanism in place when running tests.

gleitz commented 2 years ago

I didn't know brave had a search engine. I would accept that PR if you want to open it.

Alexhans commented 2 years ago

I'll take a look at what we discussed over the weekend and create the pull request. It's been a busy period.

gleitz commented 1 year ago

No worries - take your time and thanks again for any support you can give to the project.