eltonlaw / impyute

Data imputations library to preprocess datasets with missing data
http://impyute.readthedocs.io/
MIT License
352 stars 49 forks source link

Ddfg add randc function #86

Closed xyz8983 closed 5 years ago

xyz8983 commented 5 years ago

This pull request is for addressing #67

  1. Add a function randc() to randomly generate data frame with categorical data, which are alphabetic characters. Extra characters combinations would be generated when the 26 characters are used up. (If number is desired, just leave a comment, I can update it)
  2. Update the Corruptor class to accept an extra attribute dtype with default value np.float, so the Corrupter class can generate dataset in other dtype, like np.string
  3. Add test cases for randc() function. One for BadInputError test, second for testing if the number of categories in the dataset is desired, third for testing if the shape of the dataset is desired.
eltonlaw commented 5 years ago

Thanks!