Closed voremargot closed 2 years ago
What random state do we want to use throughout the project? 123?
@voremargot It worked for me in the main branch, so it should work in any other branch. Maybe @hatefr can provide more information. However, I am attaching the command I used.
python download_data.py --url=https://archive.ics.uci.edu/ml/machine-learning-databases/forest-fires/forestfires.csv --out_file=data.csv
For your second question, I am fine with 123.
The command looks fine. @voremargot were you able to do download data?
Plus, random state 123 is also good.
For some reason (1) I had an old version of the script in my features folder and (2) I wasn't calling the features with --url and --out_file! Got it working now! Thanks!
So I think this milestone is pretty easy! As far as data cleaning goes all I need to do is add in seasons column and change the target values to log scale correct? And then I'll create a train and test split and save that data to the data file. The transformations for the ML pipelines should happen in the ML script so I am leaving them out of this one. If that's all there is to it I think I'm pretty close to being done and will be able to help others with their parts this week!! @gauthampughaz @Anahita97 @hatefr
I have posted the script that does the simple preprocessing of the data and outputs a test and training dataset. The files are saved as pickle files, which is specific to python. To load a pickle file into a script you will use pickle.loads(data)
. You will also have to import the pickle library. Let me know if another file type would be better!
Why would I not be able to use the data_download script in the features branch to download my data? Can someone else try and download the data to see if it is the script or me?