Sometimes download_raw_html.py gets stuck on some download before it completes the list.
I've added a log of completed downloads. Before running, the script looks into the log (if it exists) and determines the index of the last successful download. It will then skip all completed items and go to the next one.
If master.tsv is rebuilt between runs, then the log needs to be thrown away since it depends on the contents of master.tsv.
Sometimes
download_raw_html.py
gets stuck on some download before it completes the list.I've added a log of completed downloads. Before running, the script looks into the log (if it exists) and determines the index of the last successful download. It will then skip all completed items and go to the next one.
If master.tsv is rebuilt between runs, then the log needs to be thrown away since it depends on the contents of master.tsv.