Closed ksedivyhaley closed 4 years ago
Hello Kate,
Thanks for your feedback. The script should be called with the default arguments explicitly specified to succeed. We have since improved the script to run without parameters (using the default ones)
The data file is huge (around 1.5 GB) and is downloaded from the Internet. Because of that, you may have failed to download it. Otherwise it should have appeared in the ../data
folder (unless you provided custom parameters / paths)
Lastly, the script performs data file validation with MD5 checksum hash to ensure the right file is downloaded. Thus, if you did not specify the real data file URL, it may have failed to verify it, and thus failed (leaving your data folder empty)
What we did to improve usability:
python download.py
Downloading new data file... 15.5%
Thanks!
Hi Serg,
In the Milestone 1 submission the defaults weren't clearly specified in the documentation - looking again I found them commented here:
Define constants with key values
DATA_FILE_PATH = '../data/vehicles.csv'
DATA_FILE_HASH = '06e7bd341eebef8e77b088d2d3c54585'
DATA_FILE_URL = 'http://mds.dev.synnergia.com/uploads/vehicles.csv'
(I haven't written down the url used but I assume that I was using something from Kaggle that looked like the right URL but wasn't, hence failing as you described.)
I note that your current version has improved the documentation to make the intended URL obvious, so that fixes the initial issue - script ran as intended while I wrote up this issue! Usability improvements also look good.
Unable to run download script on my machine: I get your error “Cached data file hash is invalid.” (Good use of progress/error messages, though!)
Note instruction in Milestone 1: “Also, to make things simple, I would avoid using data from cites where you have to authenticate to obtain the data (e.g., Kaggle). If that cannot be avoided, discuss with the lecture and lab instructor how you can do this reproducibly. ”
Possibly related: I'm not seeing the data file in your data folder in the repo. Too big for GitHub?