code-for-venezuela / c4v-py

3 stars 3 forks source link

Luis/classifier command #89

Closed LDiazN closed 3 years ago

LDiazN commented 3 years ago

Problem

In order to debug our current classifier models after training, is useful to have a way to check accuracy by hand with known articles. In general, we would like to have a developing suite to easily perform common operations when doing classifier experiments.

Proposed Solution:

A few commands were added to improve the workflow for experimenting:

Using this commands and the list command, you can pick news articles with the list command, inspect them with the show command, run a classification with the classify command and inspect the analysis over pieces of the article using the explain command. This way, we have a fairly ergonomic workflow to test a trained model. Note: The previous examples are assuming that a model was trained at some point in the branch my_branch with name my_experiment

Additional Changes

Classifier API

Additionally, I wrote major changes to the classifier api, which was too monolithic:

args = ClassifierArgs({ "per_device_train_batch_size" : 10, "per_device_eval_batch_size" : 1, "num_train_epochs" : 1, "warmup_steps" : 10, "load_best_model_at_end" : True,

"metric_for_best_model" : "f1",

    "save_strategy" : "epoch",
    "evaluation_strategy" : "epoch",
    "eval_accumulation_steps" : 1,
    "learning_rate" : 5e-7
},
columns=["title"],
description="Testing my new API"

) exp = ClassifierExperiment.from_branch_and_experiment("new_api", "test")

exp.run_experiment(args)


## Microscope Manager
A high level API to automate common operations with our library. This way, you can easily compose scrapers, crawlers and the classifier: `Manager`

## CLICLient
A simple object using along the CLI commands to encapsulate common logic an operations, so the command-level logic in the CLI will be thinner

# Relevant files:
* `src/c4v/c4v_cli.py`   
    * Added `show`, `classify` and `explain` commands
    * Added `CLIClient` class
* `src/c4v/microscope/manager.py` : High level manager class composing every component
* `src/c4v/classifier/classifier.py` :     
    * Refactor to remove logic concerning to experiment management from `ClassifierExperiment` class
    * Changed `ClassifierExperiment` class for just `Classifier` class
* `src/c4v/classifier/experiment.py`:     
    * File created
    * Added classes for experiments: `BaseExperiment`, `BaseExperimentArguments`, `BaseExperimentSummary`
    * Added class for experiment's files management: `ExperimentFSManager`
* `src/c4v/classifier/classifier_experiment` : implemented the three previously mentioned base classes into `ClassifierExperiment`, `ClassifierArguments` and `ClassifierSummary`

# Additional comments
I know this is a large PR, I wanted to have a full testing suite in order to easily perform experiments. Also, I wanted to do some architecture work to make the code as easily extensible as possible, so I did a lot of refactor and abstraction for both already existing code and new code