Extravi / araa-search

A privacy-respecting, ad-free, self-hosted Google metasearch engine with strong security that offers full API support and utilizes Qwant for images, and DuckDuckGo for auto-complete.
https://araa.extravi.dev
GNU Affero General Public License v3.0
242 stars 22 forks source link

Plans for adding new search engines #105

Closed amogusussy closed 2 months ago

amogusussy commented 8 months ago

In #103, you mentioned that in order to prevent no results being returned because of rate limiting, you'll implement other search engines to act as a backup.

Here's a template that I've came up with for the results:

{
  "wiki": {
    "title": "String",
    "description": "String",
    "link": "String",
    "image": "String"
  },
  "results": [
    {
      "title": "String",
      "description": "String",
      "link": "String",
      "has_sublinks": Bool,
      "sublinks": [
        {
          "title": "String",
          "description": "String",
          "link": "String"
        }
      ]
    }
  ]
}

Then we can just edit the results.html file to use this format, and it'll be much easier to implement newer engines.

Each engine should have its own file in src/textEngines/{engine}.py, and then get called in the textResults.py file.

Extravi commented 8 months ago

honestly i was just using the search thing then google blocked it its very annoying i should try to fix this asap

Extravi commented 8 months ago

image

Extravi commented 8 months ago

it was working before that image

Extravi commented 8 months ago

i did find this post by 2captcha not long ago and it has some useful info there is one more post but i cannot find it atm image https://2captcha.com/blog/google-sepr-recaptcha-june-2022

Extravi commented 8 months ago

i know it said something about sending request with cookies like the "NID" cookie etc and how the url should look to avoid getting detected

Extravi commented 8 months ago

you would need a web driver to capture thos cookies so you can send it in request

Extravi commented 8 months ago

I know that my instance processes 9.3k uncached requests, like searches, images, etc., every 24 hours, but as this project grows and I start to process more requests, it's going to get less reliable, so I'll need to try my best to work on making it much more reliable over time.

Extravi commented 8 months ago

I do not record any logs on my instance, but the Cloudflare proxy logs the number of requests, not the request made.

Extravi commented 8 months ago

Also, because I have one server in Germany, it needs to be behind a CDN for both speed and security reasons.

Extravi commented 8 months ago

Also, I'm sure using Cloudflare is fine; I don't really see it as a privacy concern. It's definitely better than sending requests directly to Google. Cloudflare offers a free version of their proxy to test updates before pushing them to their paid clients and users. I don't think Cloudflare records the request made through their proxies, and I'm sure that isn't legal. I think they use free users to test the updates made to their proxy or CDN service, so it won't impact the paid clients or companies that use them.

Extravi commented 8 months ago

Also, because I have one server in Germany, it needs to be behind a CDN for both speed and security reasons.

I have one good server in Germany behind a CDN because it's more cost-efficient, and I have more server resources available for that instance as a result. 4 cores, 8 GB of RAM. Then the network traffic is optimized using Cloudflare. The only thing that seems slow for me is autocomplete here in Canada, but I might have a fix for that soon.

Extravi commented 8 months ago

im adding support for a captcha solver so i can use it on my instance

image

Extravi commented 8 months ago

in my test it cost like 1 cent pre captcha and thats only when the abuse cookie expires

Extravi commented 8 months ago

so it would last a very long time and cost less then any api would

Extravi commented 8 months ago

this will be a setting people can use in the config if they want to enable captcha solver support with their own api key

Extravi commented 8 months ago

im adding support for a captcha solver so i can use it on my instance

image

https://github.com/Extravi/araa-search/commit/bf83d6bf325d72e7ab9b597cb25e4bf3b68d66fc

its been added

Extravi commented 8 months ago

i added the captcha solver but it has an issue it does not work with more then one worker because the variables are not shared in memory until i find a way to fix it i will only be using one worker on my instance

Extravi commented 8 months ago

if you know how to share the variables for the captcha solver please let me know or open a pull request for it

Extravi commented 8 months ago

one worker and 1 thread

Extravi commented 8 months ago

i have added the captcha solver to Araa fixed all bugs with it and its active and running on my instance

Extravi commented 8 months ago

its bug free and works with all workers