UBC-MDS / DSCI_522_Group-308_Used-Cars

This project attempts to build a regression model to predict price of used cars based on numerous features of the car
MIT License
2 stars 6 forks source link

Things to do milestone 2 #12

Closed AndresPitta closed 4 years ago

AndresPitta commented 4 years ago

Hey @pokrovskyy and @bradentam ,

Here is a checklist on the things to do for this milestone:

Andrés:

Braden:

Serg:

Markdown Script - Report

Proposal

Andres - update readme

AndresPitta commented 4 years ago

Hey guys,

I just created a pull request with the data wrangling script.

these are the parameters it takes: Options:

--DATA_FILE_PATH= Path (including filename) to retrieve the csv file. [default: ../data/vehicles.csv] --TRAIN_FILE_PATH= Path (including filename) to print the train portion as a csv file. [default: ../data/vehicles_train.csv] --TEST_FILE_PATH= Path (including filename) to print the test portion as a csv file. [default: ../data/vehicles_test.csv] --TARGET= Name of the response variable to use. [default: price] --REMOVE_OUTLIERS= Logical value that takes YES as value if the outliers should be removed, NO otherwise. [default: YES] --TRAIN_SIZE= Decimal value for the train/test split. [default: 0.9]

What I did was removing prices over the 99th percentile, fixing the NAs for the categorical variables and splitting the train/test set. Every other variable seemed to be ok, based on the EDA.

Also, I haven't done the references for the libraries I used. Please, tell me what you think,