code-for-venezuela / c4v-py

3 stars 3 forks source link

Luis/nlp classifier module #85

Closed LDiazN closed 3 years ago

LDiazN commented 3 years ago

Problem

We need a starting architecture for our classifier, so we can easily do some experiments. Such classifier is implemented in a POC by @dieko95 that you can find here, here we are just integrating that implementation to our current project

Solution

Relevant files

How to test it

branch name, experiment name

experiment = ClassifierExperiment("testing", "first_one")

print(experiment.run_experiment(train_args={'num_train_epochs' : 3}))

* Run script with python
* The following results were obtained from google colab for the previous experiment:

Training completed. Do not forget to share your model on huggingface.co/models =)

{'train_runtime': 1051.2505, 'train_samples_per_second': 5.479, 'train_steps_per_second': 0.548, 'train_loss': 0.6006555491023593, 'epoch': 3.0} 100% 576/576 [17:31<00:00, 1.83s/it] Configuration saved in /experiments/testing/first_one/config.json Model weights saved in /experiments/testing/first_one/pytorch_model.bin Running Evaluation Num examples = 480 Batch size = 10 100% 48/48 [00:30<00:00, 1.58it/s] metrics_value eval_loss 0.429582 eval_accuracy 0.797917 eval_precision 0.775701 eval_recall 0.772093 eval_f1 0.773893 eval_runtime 31.039800 eval_samples_per_second 15.464000 eval_steps_per_second 1.546000 epoch 3.000000



# Further work
* Do some more experiments to improve classifications
* Add a class to instantiate a classifier from an experiment 
* Add configuration manager to handle configuration variables as the `BASE_C4V_FOLDER` variable in `classifier.py`
* Integrate this class to our architecture:
  *   Write a mapping from `[ScrapedData]` to dataframe 
  *   Add function to get data from `PersistencyManager` as a DataFrame