aiqc / AIQC

End-to-end deep learning on your desktop or server.
BSD 3-Clause "New" or "Revised" License
106 stars 21 forks source link

Pipeline.Text #95

Open sahilgupta2105 opened 3 years ago

sahilgupta2105 commented 3 years ago

some of the methods feel incomplete, eg. from_folder tries to ingest a bunch of text files from a folder, but what about the labels?

aiqc commented 3 years ago

Option a) List argument for labels where the number of list elements is validated against the number of textdata entries?

Option b) When faced with this problem for Dataset.Image, I opted to create the higher-level Pipeline.Image which constructs both a Dataset.Tabular for the label and a Dataset.Image for the image, which is the main reason why Splitset accepts labels and features from different datasets.

If you chose not to include label columns in Dataset.Text, then you are free to name the columns whatever you like and you can automatically use the text-based encoding methods on them by default.

so there's pros and cons