UBC-MDS / DSCI_522_Group-308_Used-Cars

This project attempts to build a regression model to predict price of used cars based on numerous features of the car
MIT License
2 stars 6 forks source link

Download script: path error, minor improvements #34

Closed ksedivyhaley closed 4 years ago

ksedivyhaley commented 4 years ago

To think about: checking for cached data is a nice feature but could backfire because it doesn't let you overwrite old data. Consider adding another argument that lets you skip the cached data check. Also, does the hash check block overwrite?

pokrovskyy commented 4 years ago

Thanks for the feedback Kate. True, there was a bug in the Readme - that should have been data/vehicles.csv

Added warning to the README data file size / training time.

Regarding the cached file - it will only delete it if the hash does not match. Otherwise you can just delete the cached data file and re-run the script. But I hear your suggestion on skip cache option - added that

ksedivyhaley commented 4 years ago

Yes, you could manually delete the cached file, but it's nice to have the option to do that as part of running the function!

Regarding the cached file - it will only delete it if the hash does not match.

Since I don't know much about hashes, my question was really - does the hash matching mean the files are identical (such that if the data on the web is updated the hash will change) or does it only consider some features of the file?

pokrovskyy commented 4 years ago

Hash is computed on the downloaded data file - to confirm that there was no tampering with the data on the net. Maybe that was a little overkill :)

Anyway, I added the —OVERRIDE_CACHE option to handle this nicely as per your suggestion. That should do it. Thanks!

Sent from my iPhone

On Feb 1, 2020, at 07:58, ksedivyhaley notifications@github.com wrote:

Yes, you could manually delete the cached file, but it's nice to have the option to do that as part of running the function!

Regarding the cached file - it will only delete it if the hash does not match.

Since I don't know much about hashes, my question was really - does the hash matching mean the files are identical (such that if the data on the web is updated the hash will change) or does it only consider some features of the file?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.