code-for-venezuela / c4v-py

3 stars 3 forks source link

Luis/multilabel classification #100

Closed LDiazN closed 2 years ago

LDiazN commented 2 years ago

Problem

As much of the fields of the data we're gathering might have more than 2 labels, we need a classifier that can classify for more than just 2 labels.

Solution

  1. Refactor the current classifier class to support an arbitrary number of labels
  2. Provide a way to specify sets of labels a. Create LabelSet base class to provide support for common labelsets operations, such that mapping from id to label and vice versa, provide the nomber of arguments and hold the string value of each label. b. Add LabelSet as a configuration parameter in the classifier class c. Add labelset as an argument to the ClassifierExperimentArgs classs, so that it's easy to set up an experiment to run using a specified labelset
  3. Create training dataset for multilabel classification using a column of our current dataset with multiple values as test subject a. I choose the "servicio resumido" column as it is easier for me to check classifications, it's simple to understand and manually classify. b. The confirmation dataset should be provided too c. The ScrapedData class that works as canonical scheme for scraped data was also updated, meaning that the SQLiteManager class was also updated
  4. Provide a sample experiment with multiple labels as a test

Relevant files

Further work