Closed Ghost-chu closed 5 months ago
we could also just ignore SIGINT in the finally statement. each isfile() produces additional I/O
we could also just ignore SIGINT in the finally statement. each isfile() produces additional I/O
Actually checking if a file exists only involves the directory area of the file system, and IO isn't that bad, especially on modern systems with SSD storage devices. We are able to reduce some of the IO by loading and processing skipset first, and then isFile.
your hint with the keyboardinterruption was pretty good though. i 'm just thinking as we already reduced I/O / cpu loads we should try to keep this savings as much as possible.
edit: one remaining problem is that skipped urls are not assigned to a file. that makes json and csv response wrong. fixing this may need additional resources
I exited the program using Ctrl+C while pulling data, however unfortunately the program was forced to interrupt by me while writing to the skipfile. So my progress was reset.
I wrote my own code which helped me skip files already in the filesystem and resumed my previous progress. So I am submitting it here in the hope that it will help.