Open sahilgupta2105 opened 3 years ago
Could detect language of the text during ingestion
^ Job links to fitted encoderset.
I think what we have discovered is that handling text is more about encoding than it is about isolating a data type/ file format.
eg. in Text dataset a helper method called get_feature_matrix() maps text data to numeric form, meta data in form of the fit object is generated that does this mapping, it might be useful for the end-user to have access to this mapping object,
there is also a possibility that the transformation can be moved to EncoderSet