Open Troughy opened 5 months ago
It seems like I'm getting 429 (Too Many Requests) error codes from every single URL except search.datura.network, its onion site and lbmegc3rjnekmdxuisynqdc7y3m2tgyq7gj257ooddaobxqjw36bdayd.onion
After adding a few headers, I'm no longer getting 429 response codes, but the search query is being removed from the URL. I can reproduce this in my browser, for example when I enter https://search.einfachzocken.eu/search?q=south+park I end up at https://search.einfachzocken.eu. Then if I enter https://search.einfachzocken.eu/search?q=south+park again, it works.
Interestingly, if I force the use of Tor (useTor = true) then most sites work. But when I go to 127.0.0.1:8080, enter an input and click on search, the same thing happens (I see the main site, but when I go back and click on search again, I see the search results) ??? Maybe the issue is on my end?
Here are the headers I added, I copied them from my browser. I did not change the user agent.
req.Header.Set("Accept-Language", "en-US,en;q=0.5")
req.Header.Set("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8")
req.Header.Set("Sec-Fetch-Dest", "document")
req.Header.Set("Sec-Fetch-Mode", "navigate")
req.Header.Set("Sec-Fetch-Site", "cross-site")
req.Header.Set("TE", "trailers")
i can reproduce the issue. maybe searx sets some cookies needed for the search in the query params to work?
Possibly something server side, can't see any cookies being stored in my browser
I think it's their bot detection. https://github.com/searxng/searx-instances/discussions/417
If you have public_instance
parameter set to true, the "new advanced bot limiter called link_token" is activated, which "evaluates a request as suspicious if the URL /clientpublic_instance
to true is mandatory, but I'm pretty sure they just trust you to do it (because they ask if you did it) so I think that's why search.datura.network works (but I have no way to verify this)
should we bypass this?
Well that would be good, but I don't know how. You have to visit the main page first, the server generates a random string, puts it in the HTML as the href of a link tag, which makes your browser send a GET request for it, and then the server saves your IP to Redis for some time. That would also explain why this issue didn't show with Tor, because Tor hides your IP and the server will see the same IP for all Tor users (127.0.0.1). You could bypass this when verifying instances, but not when actually using gimmeasearx to search, because of CSP and such (but I'm not 100% sure). Actually, if everyone was running gimmeasearx locally, then, with the other bypass (when verifying instances) the searx server would remember the IP, so it would probably work...
Only gives search.datura.network as an instance, and when I blacklist it, it says there are no instances available.