Milestone 2: Script #2: Clean & Pre-process

UBC-MDS / DSCI522_group_12

MIT License

0 stars 5 forks source link

Milestone 2: Script #2: Clean & Pre-process #26

Closed d-sel closed 3 years ago

d-sel commented 3 years ago

Data cleaning/pre-processing, transforming, and/or paritioning.

This should take at least two arguments: a path/filename pointing to the data to be read in a path/filename pointing to where the cleaned/processed/transformed/paritioned data should live

larahabashy commented 3 years ago

Hey guys - regarding our conversion during last class about having the second script in python -- I found out that python can easily read .feather files, which would be output files generated by an R script. In our project, we could have the second script (the one that does the pre-processing, cleaning, and splitting of the data) in an R script that creates training.feather and test.feather files. The fourth script, which will be a .py script, can then easily grab those .feature files. Please let me know if this is ok. I will create a pull request with the second script in R shortly, please have a look. If we find this is incompatible with script 4, I'm happy to try to translate the code to Python then. Otherwise, I think it'll take too much time.

larahabashy commented 3 years ago

Update: The script is running perfectly. You can use the following command to test it out > Rscript src/pre_process_cred.r --input=data/raw/default_payment_next_month.feather --out_dir=data/processed

To add unscaled data to raw data folder:

  write_feather(test_data, paste0(out_dir, "/test_raw.feather"))
> Rscript src/pre_process_cred.r --input=data/raw/default_payment_next_month.feather --out_dir=data/raw

d-sel commented 3 years ago

This is great!