Closed widdowquinn closed 3 years ago
Example download failure:
$ cazy_webscraper.py -g me@my.domain -l test.log -o outdir
cazy_webscraper: 2020-12-03 14:26:51,639 - Run initiated
cazy_webscraper: 2020-12-03 14:26:51,639 - Creating directory outdir
cazy_webscraper: 2020-12-03 14:26:51,640 - Finished program preparation
cazy_webscraper: 2020-12-03 14:26:51,640 - Starting retrieval of data from CAZy
cazy_webscraper: 2020-12-03 14:26:51,640 - Retrieving URLs to summary CAZy class pages
[...]
Retrieving proteins from GH13: 16000it [02:03, 130.03it/s]
Parsing CAZy families: 7%|█████████████▍ | 12/169 [18:58<4:08:14, 94.87s/it]
Parsing CAZy classes: 0%| | 0/6 [18:59<?, ?it/s]
Traceback (most recent call last):
File "/Users/lpritc/opt/anaconda3/envs/cazy-test-env/bin/cazy_webscraper.py", line 33, in <module>
sys.exit(load_entry_point('cazy-webscraper', 'console_scripts', 'cazy_webscraper.py')())
[...]
File "/Users/lpritc/opt/anaconda3/envs/cazy-test-env/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
KeyboardInterrupt
$ ls outdir/
Added. All SQL interaction is under the scraper.sql
module
Downloading of significant amounts of data may take some time. If there is an interruption for any reason, the script stops, but none of the gathered data is available to the user. This could be extremely frustrating and discourage reuse.
Some options to provide kinder behaviour could include: