TorSpider / TorSpider-Backend

The database backend with which the spiders share their discoveries.
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

Add titles and base_url to onions table, populated automatically by backend. #42

Open haxys opened 6 years ago

haxys commented 6 years ago

I would like to add title to the onions table, so that we might obtain a solid title for every onion domain. It can be populated from the root url, or if the root url redirects to another url, it can be populated from that url instead. An easy way to populate this field would be to check if the page is / (and to always be sure to add / to the list of urls to scan if it doesn't already exist) and to add the title of the page if this is the case. If / redirects to another page, then use that page's title instead. We might also store an additional column in the onions table for the base_url.

All of this can be done in the backend.

This change will require us to modify the database structure. We don't necessarily need to start the database from scratch to instantiate this change; all the urls will be scanned again eventually. But if we start from a blank slate before launch, then that will take care of it too.

haxys commented 6 years ago

This has been completed in the offload_to_backend pull request. Just needs testing and merging.