Closed d-sel closed 3 years ago
Hey guys - regarding our conversion during last class about having the second script in python -- I found out that python can easily read .feather files, which would be output files generated by an R script. In our project, we could have the second script (the one that does the pre-processing, cleaning, and splitting of the data) in an R script that creates training.feather and test.feather files. The fourth script, which will be a .py script, can then easily grab those .feature files. Please let me know if this is ok. I will create a pull request with the second script in R shortly, please have a look. If we find this is incompatible with script 4, I'm happy to try to translate the code to Python then. Otherwise, I think it'll take too much time.
Update: The script is running perfectly. You can use the following command to test it out
> Rscript src/pre_process_cred.r --input=data/raw/default_payment_next_month.feather --out_dir=data/processed
To add unscaled data to raw data folder:
write_feather(test_data, paste0(out_dir, "/test_raw.feather"))
> Rscript src/pre_process_cred.r --input=data/raw/default_payment_next_month.feather --out_dir=data/raw
This is great!
Data cleaning/pre-processing, transforming, and/or paritioning.
This should take at least two arguments: a path/filename pointing to the data to be read in a path/filename pointing to where the cleaned/processed/transformed/paritioned data should live