Closed firasm closed 4 years ago
Thanks for the comment. As far as I can see, you ran the code from the root repo folder by calling python scripts/download.py
, however supplied a path relative to the local repository root folder. So the script attempted to download the data file into folder that apparently did not exist on your PC. In this case you should have specified --DATA_FILE_PATH=data/vehicles.csv
or run the script from scripts
folder. One thing I could do though is handle this case by providing more meaningful error message, like "Invalid download location specified" etc. Is that what's on your mind? I don't think the script should recursively create any additional folders for the sake of security
Alternatively, you could have run the script without parameters (from the scripts
folder) Eventually, this will be handled in the single batch.
Also, could you please comment on why do we even have to request paths through command line arguments? My original understanding was that this should be restricted to the internal repository folder structure (preset in the script) Instead, you could provide some additional parameters to other scripts, like train / test split % etc. But not the paths
I just followed the instructions you had in your README under Usage:
To replicate the analysis, clone this GitHub repository, install the dependencies listed below, and run the following commands at the command line/terminal from the root directory of this project:
# download data python scripts/download.py --DATA_FILE_PATH=../data/vehicles.csv --DATA_FILE_URL=http://mds.dev.synnergia.com/uploads/vehicles.csv --DATA_FILE_HASH=06e7bd341eebef8e77b088d2d3c54585 # data wrangling Rscript scripts/wrangling.R --DATA_FILE_PATH=../data/vehicles.csv --TRAIN_FILE_PATH=../data/vehicles_train.csv --TEST_FILE_PATH=../data/vehicles_test.csv --TARGET=price --REMOVE_OUTLIERS=YES --TRAIN_SIZE= 0.9
Tiffany mentioned in class that you should run your scripts from the root directory so you are always doing stuff in subdirectories from the current location, rather than worrying about going up directories using ..
Thanks Firas! Doing the group meeting now, just realized it's a bug in the Readme, will update accordingly. As I said, eventually this will be all encapsulated in the Makefile
. Also, we will remove default options from our Readme. It's confusing to provide all of this when it's supposed to be run without parameters. We'll make it super-clean and intuitive, with retained flexibility for those requiring advanced usage. Corresponding tasks assigned!
Closed in favour of #33
python scripts/download.py --DATA_FILE_PATH=../data/vehicles.csv --DATA_FILE_URL=http://mds.dev.synnergia.com/uploads/vehicles.csv --DATA_FILE_HASH=06e7bd341eebef8e77b088d2d3c54585
does not run after I clone the repo.
Error is:
Normally (in a real project) I'd tell you the fix, but in this case I want you to find it and fix it :-)