Extravi / araa-search

A privacy-respecting, ad-free, self-hosted Google metasearch engine with strong security that offers full API support and utilizes Qwant for images, and DuckDuckGo for auto-complete.
https://araa.extravi.dev
GNU Affero General Public License v3.0
257 stars 23 forks source link

Plans for adding new search engines #105

Closed amogusussy closed 4 months ago

amogusussy commented 11 months ago

In #103, you mentioned that in order to prevent no results being returned because of rate limiting, you'll implement other search engines to act as a backup.

Here's a template that I've came up with for the results:

{
  "wiki": {
    "title": "String",
    "description": "String",
    "link": "String",
    "image": "String"
  },
  "results": [
    {
      "title": "String",
      "description": "String",
      "link": "String",
      "has_sublinks": Bool,
      "sublinks": [
        {
          "title": "String",
          "description": "String",
          "link": "String"
        }
      ]
    }
  ]
}

Then we can just edit the results.html file to use this format, and it'll be much easier to implement newer engines.

Each engine should have its own file in src/textEngines/{engine}.py, and then get called in the textResults.py file.

Extravi commented 11 months ago

honestly i was just using the search thing then google blocked it its very annoying i should try to fix this asap

Extravi commented 11 months ago

image

Extravi commented 11 months ago

it was working before that image

Extravi commented 11 months ago

i did find this post by 2captcha not long ago and it has some useful info there is one more post but i cannot find it atm image https://2captcha.com/blog/google-sepr-recaptcha-june-2022

Extravi commented 11 months ago

i know it said something about sending request with cookies like the "NID" cookie etc and how the url should look to avoid getting detected

Extravi commented 11 months ago

you would need a web driver to capture thos cookies so you can send it in request

Extravi commented 11 months ago

I know that my instance processes 9.3k uncached requests, like searches, images, etc., every 24 hours, but as this project grows and I start to process more requests, it's going to get less reliable, so I'll need to try my best to work on making it much more reliable over time.

Extravi commented 11 months ago

I do not record any logs on my instance, but the Cloudflare proxy logs the number of requests, not the request made.

Extravi commented 11 months ago

Also, because I have one server in Germany, it needs to be behind a CDN for both speed and security reasons.

Extravi commented 11 months ago

Also, I'm sure using Cloudflare is fine; I don't really see it as a privacy concern. It's definitely better than sending requests directly to Google. Cloudflare offers a free version of their proxy to test updates before pushing them to their paid clients and users. I don't think Cloudflare records the request made through their proxies, and I'm sure that isn't legal. I think they use free users to test the updates made to their proxy or CDN service, so it won't impact the paid clients or companies that use them.

Extravi commented 11 months ago

Also, because I have one server in Germany, it needs to be behind a CDN for both speed and security reasons.

I have one good server in Germany behind a CDN because it's more cost-efficient, and I have more server resources available for that instance as a result. 4 cores, 8 GB of RAM. Then the network traffic is optimized using Cloudflare. The only thing that seems slow for me is autocomplete here in Canada, but I might have a fix for that soon.

Extravi commented 11 months ago

im adding support for a captcha solver so i can use it on my instance

image

Extravi commented 11 months ago

in my test it cost like 1 cent pre captcha and thats only when the abuse cookie expires

Extravi commented 11 months ago

so it would last a very long time and cost less then any api would

Extravi commented 11 months ago

this will be a setting people can use in the config if they want to enable captcha solver support with their own api key

Extravi commented 11 months ago

im adding support for a captcha solver so i can use it on my instance

image

https://github.com/Extravi/araa-search/commit/bf83d6bf325d72e7ab9b597cb25e4bf3b68d66fc

its been added

Extravi commented 11 months ago

i added the captcha solver but it has an issue it does not work with more then one worker because the variables are not shared in memory until i find a way to fix it i will only be using one worker on my instance

Extravi commented 11 months ago

if you know how to share the variables for the captcha solver please let me know or open a pull request for it

Extravi commented 11 months ago

one worker and 1 thread

Extravi commented 11 months ago

i have added the captcha solver to Araa fixed all bugs with it and its active and running on my instance

Extravi commented 11 months ago

its bug free and works with all workers