demostanis / gimmeasearx

Find a random searx instance
http://7tcuoi57curagdk7nsvmzedcxgwlrq2d6jach4ksa3vj72uxrzadmqqd.onion/
Other
47 stars 7 forks source link

Latest version config option caouse Internal Server Error #13

Closed lukaskwkw closed 2 years ago

lukaskwkw commented 2 years ago

When using with latest version on I get

"Internal Server Error"

tested on FireFox and Chrome

Generated url

http://localhost:8080/?blacklist=https%3A%2F%2Funwanted.instance.com%2F+%3B%0D%0A*%3A%2F%2Fanother.unwanted.instance.cn%2F+%3B&grade-v=on&grade-c=on&preferences=&minversion=1.0.0&latestversion=on&custominstances=https%3A%2F%2Fmycustom.searx.instance%2F+%3B%0D%0Ahttps%3A%2F%2Fmyothercustom.instan.ce.ru.fr.es%2F
demostanis commented 2 years ago

SearX now names their tags without a leading v, causing result := r.FindStringSubmatch(string(page)) to fail in internal/version/version.go. I changed the regex and it now works again. I'm glad people still use gimmeasearx!

lukaskwkw commented 2 years ago

It's cool project. I'm using it because searx gives results without ads compared to google and by giving random instance every time improves anominity. I got inspired and currently creating something similar but with ability to rerequest search against different instance when getting ~0 results or error. I would participate here but don't know 'go' language, unfortunatally.

demostanis commented 2 years ago

gimmeasearx is supposed to check if instances work well (that they're not rate-limited by every major search engine and that they actually give results), by checking if searching "south park" gives results with "Matt Stone" and "Trey Parker" and searching "gimmeasearx" gives results with "demostanis" and "reddit" (which isn't the case anymore on many instances as the Reddit post is getting old) using the code here: https://github.com/demostanis/gimmeasearx/blob/794a053025608b07c8d4f2b617ecdaf72ef4af8e/internal/instances/instances.go#L67

Or are you saying that multiple SearX instances might not give the same results with the same query, hence the need to search with another one? That might be because that they don't use the same default search engines. Wouldn't that be fixed by toggling every worthy search engine and setting preferences in gimmeasearx?

demostanis commented 2 years ago

(this discussion should probably be moved to another issue)

lukaskwkw commented 2 years ago

I don't know why, but even if e.g. 2 different SearX instances are similar (i.e. both have Google support enabled by default), they sometimes produce different google results. The first instance may return 0 results, but the second one works fine. My approach is actually quite simple. I do an actual search on the backend and then download instance index.html as a route with absolute urls for static files. This gives us the opportunity to check if there are any or x> 2 results and if not, query another instance (but do it maximum i.e. 3 times as maybe your query is wrong in generall) Is there any docs about SearX preferences? Here is my project https://github.com/lukaskwkw/quicksearx It is in WIP state at the moment but the main functionality (described above) works fine (but still things like customization / more filtering / instance ranks - need to be done). I'm going to vacation soon so probably I will polish it later

demostanis commented 2 years ago

gimmeasearx requests upon start all the instances from searx.space, makes a few requests to see if the results are good (much like what you do, but instead of comparing the number of results it compares against known strings in the page, e.g. "Matt Stone" when searching "South Park"). This has the advantage of not having to know/use the user's query to test the instance (unlike what you do).

The issue now comes from the fact that instances are blocking gimmeasearx, responding with 403 or 429. We should make the requests act like browsers (setting right User-Agent). The instances use filtron, using these rules by default: https://github.com/searx/searx/blob/79dc10e3828cc812b692572551e5e8eb5a7c1d38/utils/templates/etc/filtron/rules.json We can circumvent them, hopefully instance owners will be kind regarding gimmeasearx doing this...

demostanis commented 2 years ago

I've pushed some changes to circumvent filtron. This should fix your issues. Can you please test them?

demostanis commented 2 years ago

As for docs on SearX preferences, I doubt there are any. You can find the setting value (to put in the gimmeasearx config panel) in https://some.searx.instance/preferences > Cookies > scroll to the bottom. Example: image (keep what's after ?preferences=)

lukaskwkw commented 2 years ago

I think as you said user-agent and lang preferences might be actually the case. Your latest changes are looking very promising. Currently I'm on vacations so will test it after I back

lukaskwkw commented 2 years ago

I was testing the update for few days and noticed a huge improvement. Thanks for tackling that issue