Milestone #2: Preprocessing script

UBC-MDS / forest-fire-area-prediction

This project aims to predict the burned area of forest fires in the northeast region of Portugal, using meteorological and soil moisture data.

https://ubc-mds.github.io/forest-fire-area-prediction/reports/forest_fire_analysis_report.html

MIT License

9 stars 10 forks source link

Milestone #2: Preprocessing script #21

Closed voremargot closed 2 years ago

voremargot commented 2 years ago

Why would I not be able to use the data_download script in the features branch to download my data? Can someone else try and download the data to see if it is the script or me?

voremargot commented 2 years ago

What random state do we want to use throughout the project? 123?

gauthampughazhendhi commented 2 years ago

@voremargot It worked for me in the main branch, so it should work in any other branch. Maybe @hatefr can provide more information. However, I am attaching the command I used. python download_data.py --url=https://archive.ics.uci.edu/ml/machine-learning-databases/forest-fires/forestfires.csv --out_file=data.csv

gauthampughazhendhi commented 2 years ago

For your second question, I am fine with 123.

hatefr commented 2 years ago

The command looks fine. @voremargot were you able to do download data?

Plus, random state 123 is also good.

voremargot commented 2 years ago

For some reason (1) I had an old version of the script in my features folder and (2) I wasn't calling the features with --url and --out_file! Got it working now! Thanks!

voremargot commented 2 years ago

So I think this milestone is pretty easy! As far as data cleaning goes all I need to do is add in seasons column and change the target values to log scale correct? And then I'll create a train and test split and save that data to the data file. The transformations for the ML pipelines should happen in the ML script so I am leaving them out of this one. If that's all there is to it I think I'm pretty close to being done and will be able to help others with their parts this week!! @gauthampughaz @Anahita97 @hatefr

voremargot commented 2 years ago

I have posted the script that does the simple preprocessing of the data and outputs a test and training dataset. The files are saved as pickle files, which is specific to python. To load a pickle file into a script you will use pickle.loads(data). You will also have to import the pickle library. Let me know if another file type would be better!