MLpack404 / pytorch_fanatics

A library to ease everyday machine learning tasks
MIT License
2 stars 5 forks source link

Implement Dataset Class for Tabular Data #1

Open MiHarsh opened 1 year ago

MiHarsh commented 1 year ago

Discuss the approach here before implementing

theArijitDas commented 1 year ago

I would like to work on it.

Approach :

  1. Segregate numerical and categorical (dtype = "object") columns from the DataFrame, and also keep track of columns which have all unique values (like instance ids).
  2. Fill in NULL/NaN values with mean, for numerical and mode for categorical features.
  3. Type cast categorical data to Pandas "category" data type and use its inbuilt 'codes' method to encode the feature into integers.
  4. Return a dictionary with {column name : corresponding value at requested index} as key : value pairs.