Hello, this is more related to Brave Search itself. but can you get in contact with the Internet Archive and donate crawl data to the wayback machine? Alexa Internet did that until it's disintegration in 2020 by Amazon. The wayback machine is an extremely useful resource that is used all across the world by researchers, journalists, and basically anyone on YouTube doing an investigation related to something online, like the origin of an urban legend, for instance. Since you already have the Wayback Machine integrated into the browser, the chance of a link completely lost to time should decrease if you donate the crawl data. The crawl data donated by Brave would be extremely helpful, and ask the Archive staff to give you a list of all archived URLs on the wayback machine, deduplicate them, and add the links that are both not crawled by Brave and are still up to the search results, to make a third search engine to rival Google and Bing. Other good sources of links could be https://ODCrawler.xyz , and many AI image datasets.
I also think the dataset generated by this project should become public or be linked to an existing public web crawler project instead of creating another walled garden index.
Hello, this is more related to Brave Search itself. but can you get in contact with the Internet Archive and donate crawl data to the wayback machine? Alexa Internet did that until it's disintegration in 2020 by Amazon. The wayback machine is an extremely useful resource that is used all across the world by researchers, journalists, and basically anyone on YouTube doing an investigation related to something online, like the origin of an urban legend, for instance. Since you already have the Wayback Machine integrated into the browser, the chance of a link completely lost to time should decrease if you donate the crawl data. The crawl data donated by Brave would be extremely helpful, and ask the Archive staff to give you a list of all archived URLs on the wayback machine, deduplicate them, and add the links that are both not crawled by Brave and are still up to the search results, to make a third search engine to rival Google and Bing. Other good sources of links could be https://ODCrawler.xyz , and many AI image datasets.