Open davidfstr opened 9 months ago
In the future it may be desirable to add a TUI (Terminal UI) to Crystal so that it can be fully controlled over an SSH connection (without X11). However that would add a significant maintenance overhead to keep future changes to the GUI and TUI in sync.
Even with a TUI, special consideration will still need to be taken to actually view any downloaded pages.
EC2 Instance types that seem promising, with on-demand pricing, for 1-2¢/hr:
If I want to support long-running crawl processes in the future economically, EC2 Spot Instances have even better pricing, at the cost of requiring Crystal to understand & react to Spot Instance Interruption Notices and consider other Spot Instance Best Practices.
If I wanted to support distributed crawling economically with EC2 Spot Instances, reacting to Instance Rebalance Recommendations would also be a good idea.
Some large sites like YRE and KC can require the download of 2+ TB of content. That can be troublesome when my effective bandwidth cap per month is about 500 GB (0.5 TB). For sites like these, it may make sense to download them from a datacenter location rather than my usual location.
Sketch of how to use Crystal in a datacenter location: