benbusby / whoogle-search

A self-hosted, ad-free, privacy-respecting metasearch engine
https://pypi.org/project/whoogle-search/
MIT License
9.33k stars 924 forks source link

[FEATURE] add obfs4proxy support #1028

Closed Pinkolik closed 1 year ago

Pinkolik commented 1 year ago

Hello! I'm trying to run whoogle search via docker-compose on my Orange PI 3 LTS (aarch64) but it looks like the container doesn't support obfs4proxy and I can't connect to the Tor network without it, because it's blocked in my region. Is there any workaround?

I also tried to run whoogle as systemd service but for some reason service was ignoring WHOOGLE_CONFIG_TOR property and was sending requests through clear-net.

Any help is appreciated!

benbusby commented 1 year ago

What problem are you getting with obs4proxy? It seems like it should work fine from what I can tell. Could you just pull the docker image, build obs4proxy inside it manually, and then use that modified image?

With the systemd WHOOGLE_CONFIG_TOR issue, did you check that the the home page Use tor config option was checked? Since user sessions persist between launches, tor might've stayed disabled for your session, since that's the default behavior. If the option on the home page was enabled and requests were still not using tor, that's likely a bug I need to check out.

Pinkolik commented 1 year ago

@benbusby, thanks for the tip! I've managed to build obfs4proxy inside Dockerfile and it seems to be working according to logs. (Bootstrapped 100% at the beginning and New control connection on my every request). However, somehow google still detects my IP address, location and language when I'm executing queries image Am I still leaking my IP somehow? I have these in my .env

# Use Tor if available
WHOOGLE_CONFIG_TOR=1
WHOOGLE_TOR_USE_PASS=1

Edit: It seems that it's leaking not my PC IP address but my server's IP.

benbusby commented 1 year ago

How are you checking that your IP is detected? There should be a "You are using Tor" banner at the top of all search results that used Tor to perform the request. If not, and the home page config indicates that Tor is enabled, then there's likely a deeper problem.

Regarding location/language, Google uses geolocation based on wherever the request is made from to modify the returned results. So for Tor requests it kinda depends on where your exit node is, but it seems like in this case there's possibly still a problem with the Tor connection.

Pinkolik commented 1 year ago

@benbusby , you're right. Tor was disabled for some reason. image How do I troubleshoot that? Here's logs

Jul 12 10:21:34.000 [notice] Bootstrapped 95% (circuit_create): Establishing a Tor circuit
Jul 12 10:21:34.000 [notice] Bootstrapped 100% (done): Done
Jul 12 10:21:48.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:48.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:50.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:50.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:51.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:51.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:51.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:51.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:52.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:52.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:52.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:52.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:52.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:52.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:52.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:52.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:52.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:52.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:52.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:52.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:55.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:55.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:55.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:56.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:56.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:56.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:56.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:56.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:56.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:56.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:57.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:57.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:57.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:57.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:57.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:57.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:58.000 [notice] New control connection opened from 127.0.0.1.
Jul 12 10:21:58.000 [notice] New control connection opened from 127.0.0.1.
* Finished creating ddg bangs json
Pinkolik commented 1 year ago

@benbusby, never mind, I had a wrong password in control.conf :D Anyway, everything is working now as it should. Should I create a PR with obfs4proxy build steps commented out in a Dockerfile? Just in case someone stumbles upon this again

Pinkolik commented 1 year ago

@benbusby I guess I made a conclusion too early. After some time (like hour or two) on every search request I get this. image What could that be?

benbusby commented 1 year ago

Seems like Google might be changing how they block requests from Tor. It used to be that they would return a Captcha for requests received from a Tor node, but now it looks like they just return a 403 error. Unfortunately I'm not too sure what to do about this, as they've been increasingly aggressive towards anything coming from Tor. The only real option is to keep trying new connections until you reach an exit node that Google hasn't flagged yet, but that can sometimes take a long time.

Should I create a PR with obfs4proxy build steps commented out in a Dockerfile? Just in case someone stumbles upon this again

Sure! I'm open to that.