davidfstr / Crystal-Web-Archiver

Downloads websites for long-term archival.
http://dafoster.net/projects/crystal-web-archiver
60 stars 5 forks source link

Can run on remote server using X11 forwarding #184

Open davidfstr opened 9 months ago

davidfstr commented 9 months ago

Some large sites like YRE and KC can require the download of 2+ TB of content. That can be troublesome when my effective bandwidth cap per month is about 500 GB (0.5 TB). For sites like these, it may make sense to download them from a datacenter location rather than my usual location.

Sketch of how to use Crystal in a datacenter location:

davidfstr commented 9 months ago

In the future it may be desirable to add a TUI (Terminal UI) to Crystal so that it can be fully controlled over an SSH connection (without X11). However that would add a significant maintenance overhead to keep future changes to the GUI and TUI in sync.

Even with a TUI, special consideration will still need to be taken to actually view any downloaded pages.

davidfstr commented 7 months ago

EC2 Instance types that seem promising, with on-demand pricing, for 1-2¢/hr:

Screen Shot 2024-03-11 at 8 39 36 AM

If I want to support long-running crawl processes in the future economically, EC2 Spot Instances have even better pricing, at the cost of requiring Crystal to understand & react to Spot Instance Interruption Notices and consider other Spot Instance Best Practices.

If I wanted to support distributed crawling economically with EC2 Spot Instances, reacting to Instance Rebalance Recommendations would also be a good idea.