benbusby / whoogle-search

A self-hosted, ad-free, privacy-respecting metasearch engine
https://pypi.org/project/whoogle-search/
MIT License
9.4k stars 927 forks source link

[FEATURE] speed up return of results? #608

Open deanord opened 2 years ago

deanord commented 2 years ago

I love, love, love Whoogle. This is how search should work.

When I submit a search request, it takes about 5 seconds for the results to come back. This seems slow. I know this doesn't sound like a big issue, but considering the number of searches one does in a day, it adds up. Is there anything I can do to make this faster or is this an issue of code optimization?

Albonycal commented 2 years ago

are you using a public instance ?

On Mon, Jan 10, 2022 at 9:00 PM deanord @.***> wrote:

I love, love, love Whoogle. This is how search should work.

When I submit a search request, it takes about 5 seconds for the results to come back. This seems slow. I know this doesn't sound like a big issue, but considering the number of searches one does in a day, it adds up. Is there anything I can do to make this faster or is this an issue of code optimization?

— Reply to this email directly, view it on GitHub https://github.com/benbusby/whoogle-search/issues/608, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP7TNJ4GIGMYFLLE5LB5AULUVL3RRANCNFSM5LT6JJXA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

deanord commented 2 years ago

@Albonycal , no. I'm running it on a home server (SSD, 64Gb ram, core-i9) inside a docker container. Every other container runs super zippy. Had the same issue with my old NUC, and since I love/use Whoogle so much, I bought and build the aforementioned server with the hopes of making Whoogle faster. :) Also, it's the same performance whether or not I'm accessing it from my home network or from an outside network.

Albonycal commented 2 years ago

hmm weird

silverwings15 commented 2 years ago

yeah i use a public whoogle instance sometimes and it isn't the fastest, otherwise i would've made it my main search engine over a searx instance

DUOLabs333 commented 2 years ago

Well, it has to download the page from Google, then grep/search-and-replace/etc. Google.com does not need to do that -- they have direct access to whatever API they're using. Maybe the downloading is the bottleneck?

benbusby commented 2 years ago

Hmmm, that's odd. 5 seconds is definitely too slow, especially for a home server like yours.

For comparison, a semi-private instance I run on a small VPS (1GB ram, 1 vCPU) returns results between ~900ms and 1.2s. Running it locally is a lot faster -- almost always below 1s for full page load.

More than likely the downloading is the bottleneck here, but I'm not sure why it would be drastically different in this scenario. I'll try to look into it soon. In the meantime, @deanord could you clarify if the instance you're running in your home network is behind a VPN?

deanord commented 2 years ago

@benbusby I'd love those speeds. :) I'm not running my network through a VPN. I did at one point a few months ago and had to whitelist Google/Whoogle regardless--Google tends to flag the VPN IPs with CAPTCHA challenges. Regular Google queries are zippy.

willobar commented 2 years ago

I am running Whoogle in a Docker container on Linux. The results page takes 5-10 seconds to load if I am running the container on a docker bridge network. If I switch the container to the host network, the results page loads in under 1 second. I can recreate this behavior by swapping the container between bridge and host networks. The logs don't seem to show anything useful

The majority of my containers work just fine on docker bridge networks, but one other app is having this same issue as well. Maybe it has to do with the base image or a perfect storm of versions in the tech stack that cause some sort of conflict on bridge networks.

Albonycal commented 2 years ago

what is your latency to google.com ?

On Sun, Jan 23, 2022 at 2:45 AM willobar @.***> wrote:

I am running Whoogle in a Docker container on Linux. The results page takes 5-10 seconds to load if I am running the container on a docker bridge network. If I switch the container to the host network, the results page loads in under 1 second. I can recreate this behavior by swapping the container between bridge and host networks. The logs don't seem to show anything useful

The majority of my containers work just fine on docker bridge networks, but one other app is having this same issue as well. Maybe it has to do with the base image or a perfect storm of versions in the tech stack that cause some sort of conflict on bridge networks.

— Reply to this email directly, view it on GitHub https://github.com/benbusby/whoogle-search/issues/608#issuecomment-1019359326, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP7TNJ4WFRMIBAYLBXDSEGLUXMM6LANCNFSM5LT6JJXA . You are receiving this because you were mentioned.Message ID: @.***>

willobar commented 2 years ago

I rebooted my modem and router and it resolved my issue.

Albonycal commented 2 years ago

😄
was it DNS

DUOLabs333 commented 2 years ago

This should be closed now.

eddydc commented 2 years ago

Pls don't close this thread since I've a similar issue. I'm running on a VPS proxied by nginx with minimal configuration and secured by basic authentication. The first time I do a search, the result appears after approximately 5 secs. Consecutive searches appear much faster. I leave the browser open and do a new search a while later and the first result takes again some seconds, the consecutive results appear faster. I tried to option not to use tor, which can slow things, but that doesn't seem to make difference. Any suggestions on how to speed up things?

willobar commented 2 years ago

It sounds like your VPS provider could be deallocating resources after a period of inactivity, then it takes a few seconds to spin back up when you use it again.

eddydc commented 2 years ago

That was my initial idea too, but it wouldn't explain why other applications like WordPress don't seem to have the same issue

akhiljalagam commented 2 years ago

running in local. having this issue. results are slow compared to google.com

g-k-m commented 1 year ago

I just installed it with pip install whoogle-search, then ran it from cmd on windows 10 with whoogle-search

then i go to 127.0.0.1:5000 in browser, takes about 10 seconds to load. unbelievably slow. and what wrong could i have done? I've got a fast 100 mbps internet, i literally did 2 commands, and i've got a modern gaming pc. It's jsut way too slow to be used. i even tried multiple browsers, same thing. Also the commandline says WARNING:app:404 Not Found: The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again. every now and then. So yeah back to searx i guess

Bovive commented 1 year ago

I have the same problem. About 10 seconds for page load when searching. Tried it with Docker and without Docker on Ubuntu and had the same problem. The receive the same warnings as g-k-m when it loads.

144358 commented 1 year ago

我只是用pip安装嗖嗖搜索安装了它,然后从Windows 10上的cmd运行它,并带有嗖嗖搜索

然后我在浏览器中转到 127.0.0.1:5000,加载大约需要 10 秒。难以置信的慢。我能做错什么?我有一个快速的 100 mbps 互联网,我真的做了 2 个命令,我有一台现代游戏电脑。它太慢了,无法使用。我什至尝试了多个浏览器,同样的事情。命令行还显示警告:app:404 未找到:在服务器上找不到请求的 URL。如果您手动输入了 URL,请检查您的拼写,然后重试。时不时地。所以是的,我想回到 searx

I also had the same issue and then spent the past few days frantically researching how to fix the problem

aodix85 commented 12 months ago

I added varnish to mine and that sped it up quite a bit. im using whoogle as a pipx installation with varnish on the same virtual machine andthen run those through haproxy along with my other services, though i am the only one who uses it. from the bookstore that i am at now it runs about 1.5 seconds first time i search for something and if i search that again it will be about 1s. images will take about 5 s to load the entire page but it begins loading within 1.5s, and videos take about 3 seconds to load thumbnails. i am talking about 25 responses instead of the default 10 which had me clicking 'next' much too often......varnish seemed to help quite a bit (as did using vitio instead of e1000 on proxmox!!!).

newtoallofthis123 commented 11 months ago

I too run Whoogle Locally on a docker container and it does slow down sometimes.

When it does, just a simple restart fixes it most of times...if not, probably try removing the image and container and trying again.

Moreover, whoogle has to scrap google, so sometimes, the complexity of your query might also effect the result time. Like for me atleast, whenever I search about a movie for example, whoogle is typically a bit slower, as compared to a simpler query

calderjh commented 10 months ago

I have this issue too. Searches are lovely and fast but that fist query in the search bar (as opposed to the home page) takes a number of seconds - as if it is waking up the instance or something. It's a constant issue for me. To compare, I switched back to google as the default search engine and it was super zippy. It's not a whoogle issue it's something inbetween.

prescientmoon commented 7 months ago

I have this issue as well. I run it as a local docker container, yet the first query usually takes more than 5s, while google takes 1-2

aodix85 commented 7 months ago

I have this issue as well. I run it as a local docker container, yet the first query usually takes more than 5s, while google takes 1-2

I'd recommend adding Varnish to your stack, it should appreciably improve speeds and the 'feel' if you will. Aside from that though I ended up switching to SearXNG about three months ago due to speed and feature set and I'm pretty happy with it, though I think I will add Varnish to this as well

prescientmoon commented 7 months ago

I'd recommend adding Varnish to your stack, it should appreciably improve speeds and the 'feel' if you will. Aside from that though I ended up switching to SearXNG about three months ago due to speed and feature set and I'm pretty happy with it, though I think I will add Varnish to this as well

I was considering searxng, but I thought you had to use a plugin to blacklist certain websites, which seemed like more trouble than using whoogle. At this point, I might as well switch to searxng though.

Varnish looks interesting. Do you have to add special configuration for whoogle/searxng? What exactly does it help with? Is it just about the static assets?

aodix85 commented 7 months ago

I'd recommend adding Varnish to your stack, it should appreciably improve speeds and the 'feel' if you will. Aside from that though I ended up switching to SearXNG about three months ago due to speed and feature set and I'm pretty happy with it, though I think I will add Varnish to this as well

I was considering searxng, but I thought you had to use a plugin to blacklist certain websites, which seemed like more trouble than using whoogle. At this point, I might as well switch to searxng though.

Varnish looks interesting. Do you have to add special configuration for whoogle/searxng? What exactly does it help with? Is it just about the static assets?

You may be right about that - I haven't tried blacklisting with it - I'd normally do that at the router level which I believe would then also disallow results from showing even if accessing it off-site - at least I'd think thatd work but ive never looked into it.....huh, now I have a new thing to try :) thanks! As for Varnish, depending on how you set it up it will normally cache static assets but I believe it would also cache results...I had turned caching off for images only because for whatever reason that was giving me headaches but apart from that there wasn't anything special needed for the setup I used. And it didn't take anything in terms of hardware so it was basically just giving me back performance left on the table. My personal opinion is caching should be something included with a package like this but I'm not intelligent enough myself to make that happen so I can't complain. If I remember correctly I think I had to turn off image caching because of how I had haproxy setup maybe? I'm sorry I wish I could recall better...long story short though yea id recommend it even if you're only caching static assets as it will still make a difference and for essentially no cost aside from your setup time.

prescientmoon commented 7 months ago

You may be right about that - I haven't tried blacklisting with it - I'd normally do that at the router level which I believe would then also disallow results from showing even if accessing it off-site - at least I'd think thatd work but ive never looked into it.....huh, now I have a new thing to try :) thanks! As for Varnish, depending on how you set it up it will normally cache static assets but I believe it would also cache results...I had turned caching off for images only because for whatever reason that was giving me headaches but apart from that there wasn't anything special needed for the setup I used. And it didn't take anything in terms of hardware so it was basically just giving me back performance left on the table. My personal opinion is caching should be something included with a package like this but I'm not intelligent enough myself to make that happen so I can't complain. If I remember correctly I think I had to turn off image caching because of how I had haproxy setup maybe? I'm sorry I wish I could recall better...long story short though yea id recommend it even if you're only caching static assets as it will still make a difference and for essentially no cost aside from your setup time.

Wouldn't blocking at the router level disallow accessing the sites? That's not my issue ― I just want to prevent certain things from showing in the search result (mostly spammy SEO websites with autogenerated content)

aodix85 commented 7 months ago

You may be right about that - I haven't tried blacklisting with it - I'd normally do that at the router level which I believe would then also disallow results from showing even if accessing it off-site - at least I'd think thatd work but ive never looked into it.....huh, now I have a new thing to try :) thanks! As for Varnish, depending on how you set it up it will normally cache static assets but I believe it would also cache results...I had turned caching off for images only because for whatever reason that was giving me headaches but apart from that there wasn't anything special needed for the setup I used. And it didn't take anything in terms of hardware so it was basically just giving me back performance left on the table. My personal opinion is caching should be something included with a package like this but I'm not intelligent enough myself to make that happen so I can't complain. If I remember correctly I think I had to turn off image caching because of how I had haproxy setup maybe? I'm sorry I wish I could recall better...long story short though yea id recommend it even if you're only caching static assets as it will still make a difference and for essentially no cost aside from your setup time.

Wouldn't blocking at the router level disallow accessing the sites? That's not my issue ― I just want to prevent certain things from showing in the search result (mostly spammy SEO websites with autogenerated content)

Right - so I believe if you block those spammy sites at the router level then they won't show in the results but the search engine will still work. If you're onsite where the search engine is (for me it's my.home) then yea those sites would be disallowed altogether. I get what you mean though, you want to be able to access them if you want but not see them in search. I've never done it on either platform so I just don't know. For me though I think blocking altogether would be ok but that's just me.

prescientmoon commented 7 months ago

Right - so I believe if you block those spammy sites at the router level then they won't show in the results but the search engine will still work. If you're onsite where the search engine is (for me it's my.home) then yea those sites would be disallowed altogether. I get what you mean though, you want to be able to access them if you want but not see them in search. I've never done it on either platform so I just don't know. For me though I think blocking altogether would be ok but that's just me.

Wouldn't they still appear in search results then? Those results come from google, which has access to said websites, so why would whoogle then block them before serving the page.

aodix85 commented 7 months ago

Right - so I believe if you block those spammy sites at the router level then they won't show in the results but the search engine will still work. If you're onsite where the search engine is (for me it's my.home) then yea those sites would be disallowed altogether. I get what you mean though, you want to be able to access them if you want but not see them in search. I've never done it on either platform so I just don't know. For me though I think blocking altogether would be ok but that's just me.

Wouldn't they still appear in search results then? Those results come from google, which has access to said websites, so why would whoogle then block them before serving the page.

That's a good point, you may be right. I don't know because I haven't tried it yet but my thought was if I blocked the site at the router and Google sent those results to SearXNG that SearXNG would ignore them but now that you mention it SearXNG isn't getting the actual site so I guess it'd still show them...??? Don't know, haven't tried. Still for me I think I'd just block them at the router because I wouldn't want to get to them regardless, same as when I use blocklists on pfsense or openwrt....but the particulars of your use case seem just a bit different than mine Short answer - I don't know, never tried, you're probably right :) I'm at work now but I may look into all that later just to see what does what....you've piqued my interest!

andreasluemkemann commented 7 months ago

Hello,

in case anyone got an issue with slow whoogle loading: in my setup it was caused by using basic auth. When I changed my search provider and bookmarks to include the user and pass it was blazing fast again.

Type URL
Bookmark https://whoogle.domain.com
Fixed Bookmark https://user:pass@whoogle.domain.com
Search URL https://whoogle.domain.com/search?q=%s
Fixed Search https://user:pass@whoogle.domain.com/search?q=%s

Hope this helps, Andy

prescientmoon commented 6 months ago

Hello,

in case anyone got an issue with slow whoogle loading: in my setup it was caused by using basic auth.

Hi! I am facing the slowness issue without using auth in the first place.