ArchiveTeam / grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Other
1.31k stars 129 forks source link

Pause gracefully if OSError (No space left on device) #206

Open TheTechRobo opened 2 years ago

TheTechRobo commented 2 years ago

Instead of crashing the crawl, why don't we simply pause and retry every so often if the computer has no space left?

ivan commented 2 years ago

I believe wpull and ludios_wpull do this if psutil is installed (psutil>=2.0,<=4.2 is in its requirements). I think I didn't include the requirement in grab-site for compatibility with Windows or macOS? Not sure.

https://github.com/ArchiveTeam/grab-site/blob/6269289a2ca874bae52f116016ca54dc8887d0cc/extra_docs/pause_resume_grab_sites.sh is the workaround I use to avoid crashes right now.

I agree this should be better and work by default.

TheTechRobo commented 2 years ago

Well, a 50GB crawl I had crashed. :/

Just installed it for the future.

TheTechRobo commented 2 years ago

Nope, didn't do anything. I have psutil yet my 33GB crawl just crashed because I ran out of space.

TheTechRobo commented 2 years ago

Where does this error pop up, I can't remember but I want to fix this.

Do you think just putting it in a try block would work and catch OSErrors, and pause the crawl, or should we do something else?