asciimoo / morty

Privacy aware web content sanitizer proxy as a service
GNU Affero General Public License v3.0
485 stars 58 forks source link

Morty Image Proxy Issues #106

Open ryouko-dev opened 3 years ago

ryouko-dev commented 3 years ago

With Morty's image proxy enabled in the searx settings.yml, I found that some images will not load properly. When disabled, everything will load normally without issues.

Searx instance not loading images properly with Morty proxy: pikachu_not_loading

In an attempt to combat the issue I tried changing Morty's timeout value from 5 to 10 seconds, but there was no improvement. To go a bit further I attempted to enable followredirect which didn't improve anything either. Since nothing improved, I went back to the default values.

I went further and tried inspecting morty itself to get an idea of what was happening and why this seems to be happening when loading images. I got the following output which I shortened to prevent a massive code block since the error is always the same:

2021/05/15 00:45:40 GET https://tse3.mm.bing.net/th?id=OIP.dlZObkDNs_DapL5-RdYReAHaFG&pid=Api
2021/05/15 00:45:40 GET https://tse3.mm.bing.net/th?id=OIP.GmUd-r2vAQeB1pk62XXvzwHaC2&pid=Api
2021/05/15 00:45:41 GET https://live.staticflickr.com/2340/2625236627_2e83523858_t.jpg
2021/05/15 00:45:42 error: dialing to the given TCP address timed out
2021/05/15 00:45:43 error: dialing to the given TCP address timed out
2021/05/15 00:45:44 error: dialing to the given TCP address timed out

Since the issue seems to be related to a timeout error, I tried changing the timeout setting for Morty from 5 to a number that frankly makes no sense, 600. The result is the exact same. I'm not sure exactly why this behavior is happening, but it seems to be the case across all Searx instances that proxy images through Morty. Thought I should share my findings so we can figure out what may be causing the problem and hopefully resolve it if possible.

ryouko-dev commented 3 years ago

After following this down the rabbit hole, I found out that this error is from Valyala's fasthttp with little that we can do. It simply means that the server is being asked to send information too quickly and can't accept them fast enough which results in this behavior.

The only potential way I found to resolve this was through fasthttp.PipelineClient but that seems to be a gamble as most people that have tried that have said it still throws the same error. This error is likely to continue even after that is implemented if it isn't implemented already due to search engines only allowing a certain amount of connections at one time from one IP address, and since images are loaded one request at a time it seems like this is unavoidable at the moment.

I will leave this issue open in the event that someone else would like to attempt to resolve this bug. I would do so myself but my knowledge of go is extremely limited and fasthttp seems to be a rather complex program to work with.

erikdubbelboer commented 3 years ago

The error is not with fasthttp, the error is with the upstream server that can't handle the load or with this proxy not configuring the fasthttp.Client correctly for this use.

fasthttp.PipelineClient is probably not a good idea as not all servers support it.

You should try setting MaxConnsPerHost = 4 and Client.MaxConnWaitTimeout = time.Minute. That way you won't spam the server with many connections (it's 512 connections per host by default) but will only open 4 connections at once and have your client wait for a minute for a connection to open up.

ryouko-dev commented 3 years ago

I made all of the above changes alongside some within Searx itself. My search results are now working as intended in the Images category. I'll be making a pull request within the next few days to resolve this issue for others that may be impacted.