Download script: path error, minor improvements

UBC-MDS / DSCI_522_Group-308_Used-Cars

This project attempts to build a regression model to predict price of used cars based on numerous features of the car

MIT License

2 stars 6 forks source link

Download script: path error, minor improvements #34

Closed ksedivyhaley closed 4 years ago

ksedivyhaley commented 4 years ago

Download script ends with error FileNotFoundError: [Errno 2] No such file or directory: '../data/vehicles.csv' : runs with the argument data/vehicles.csv.
Download script implements appropriate improvements based on feedback.
The progress bar is a great feature
The README should also warn the user that the data file is very large.

To think about: checking for cached data is a nice feature but could backfire because it doesn't let you overwrite old data. Consider adding another argument that lets you skip the cached data check. Also, does the hash check block overwrite?

pokrovskyy commented 4 years ago

Thanks for the feedback Kate. True, there was a bug in the Readme - that should have been data/vehicles.csv

Added warning to the README data file size / training time.

Regarding the cached file - it will only delete it if the hash does not match. Otherwise you can just delete the cached data file and re-run the script. But I hear your suggestion on skip cache option - added that

ksedivyhaley commented 4 years ago

Yes, you could manually delete the cached file, but it's nice to have the option to do that as part of running the function!

Regarding the cached file - it will only delete it if the hash does not match.

Since I don't know much about hashes, my question was really - does the hash matching mean the files are identical (such that if the data on the web is updated the hash will change) or does it only consider some features of the file?

pokrovskyy commented 4 years ago

Hash is computed on the downloaded data file - to confirm that there was no tampering with the data on the net. Maybe that was a little overkill :)

Anyway, I added the —OVERRIDE_CACHE option to handle this nicely as per your suggestion. That should do it. Thanks!

Sent from my iPhone

On Feb 1, 2020, at 07:58, ksedivyhaley notifications@github.com wrote:

Yes, you could manually delete the cached file, but it's nice to have the option to do that as part of running the function!

Regarding the cached file - it will only delete it if the hash does not match.

Since I don't know much about hashes, my question was really - does the hash matching mean the files are identical (such that if the data on the web is updated the hash will change) or does it only consider some features of the file?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.