demostanis / gimmeasearx

Find a random searx instance
http://7tcuoi57curagdk7nsvmzedcxgwlrq2d6jach4ksa3vj72uxrzadmqqd.onion/
Other
47 stars 7 forks source link

Does not work. #21

Open Troughy opened 5 months ago

Troughy commented 5 months ago

Only gives search.datura.network as an instance, and when I blacklist it, it says there are no instances available.

Troughy commented 5 months ago

It seems like I'm getting 429 (Too Many Requests) error codes from every single URL except search.datura.network, its onion site and lbmegc3rjnekmdxuisynqdc7y3m2tgyq7gj257ooddaobxqjw36bdayd.onion

Troughy commented 5 months ago

After adding a few headers, I'm no longer getting 429 response codes, but the search query is being removed from the URL. I can reproduce this in my browser, for example when I enter https://search.einfachzocken.eu/search?q=south+park I end up at https://search.einfachzocken.eu. Then if I enter https://search.einfachzocken.eu/search?q=south+park again, it works.

Interestingly, if I force the use of Tor (useTor = true) then most sites work. But when I go to 127.0.0.1:8080, enter an input and click on search, the same thing happens (I see the main site, but when I go back and click on search again, I see the search results) ??? Maybe the issue is on my end?

Here are the headers I added, I copied them from my browser. I did not change the user agent.

req.Header.Set("Accept-Language", "en-US,en;q=0.5")
req.Header.Set("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8")
req.Header.Set("Sec-Fetch-Dest", "document")
req.Header.Set("Sec-Fetch-Mode", "navigate")
req.Header.Set("Sec-Fetch-Site", "cross-site")
req.Header.Set("TE", "trailers")
demostanis commented 5 months ago

i can reproduce the issue. maybe searx sets some cookies needed for the search in the query params to work?

Troughy commented 5 months ago

Possibly something server side, can't see any cookies being stored in my browser

Troughy commented 5 months ago

I think it's their bot detection. https://github.com/searxng/searx-instances/discussions/417 If you have public_instance parameter set to true, the "new advanced bot limiter called link_token" is activated, which "evaluates a request as suspicious if the URL /client.css is not requested by the client" (searxng docs, scroll to "Method link_token") Btw they say setting public_instance to true is mandatory, but I'm pretty sure they just trust you to do it (because they ask if you did it) so I think that's why search.datura.network works (but I have no way to verify this)

demostanis commented 5 months ago

should we bypass this?

Troughy commented 5 months ago

Well that would be good, but I don't know how. You have to visit the main page first, the server generates a random string, puts it in the HTML as the href of a link tag, which makes your browser send a GET request for it, and then the server saves your IP to Redis for some time. That would also explain why this issue didn't show with Tor, because Tor hides your IP and the server will see the same IP for all Tor users (127.0.0.1). You could bypass this when verifying instances, but not when actually using gimmeasearx to search, because of CSP and such (but I'm not 100% sure). Actually, if everyone was running gimmeasearx locally, then, with the other bypass (when verifying instances) the searx server would remember the IP, so it would probably work...