We need a starting architecture for our classifier, so we can easily do some experiments. Such classifier is implemented in a POC by @dieko95 that you can find here, here we are just integrating that implementation to our current project
Solution
Write a class that automates the entire pipeline for this classifier, the main target is that such class implements this process in multiple steps that can be easily changed or configured using parameters
Use a scheme for managing multiple experiments, so we can manage multiple results: experiments are defined by a branch name and an experiment name, in a folder under the .c4v folder at $HOME by default
Relevant files
src/c4v/classifier/classifier.py : Here we defined the ClassifierExperiment class that automates the entire process of a classification training. The main function is run and you can pass it a dict with the fields for training arguments to override the default settings
How to test it
Create a script, say test.py at the root project folder
Write the folowwing code:
from c4v.classifier.classifier import ClassifierExperiment
# Further work
* Do some more experiments to improve classifications
* Add a class to instantiate a classifier from an experiment
* Add configuration manager to handle configuration variables as the `BASE_C4V_FOLDER` variable in `classifier.py`
* Integrate this class to our architecture:
* Write a mapping from `[ScrapedData]` to dataframe
* Add function to get data from `PersistencyManager` as a DataFrame
Problem
We need a starting architecture for our classifier, so we can easily do some experiments. Such classifier is implemented in a POC by @dieko95 that you can find here, here we are just integrating that implementation to our current project
Solution
.c4v
folder at$HOME
by defaultRelevant files
src/c4v/classifier/classifier.py
: Here we defined theClassifierExperiment
class that automates the entire process of a classification training. The main function isrun
and you can pass it a dict with the fields for training arguments to override the default settingsHow to test it
test.py
at the root project folderbranch name, experiment name
experiment = ClassifierExperiment("testing", "first_one")
print(experiment.run_experiment(train_args={'num_train_epochs' : 3}))
Training completed. Do not forget to share your model on huggingface.co/models =)
{'train_runtime': 1051.2505, 'train_samples_per_second': 5.479, 'train_steps_per_second': 0.548, 'train_loss': 0.6006555491023593, 'epoch': 3.0} 100% 576/576 [17:31<00:00, 1.83s/it] Configuration saved in /experiments/testing/first_one/config.json Model weights saved in /experiments/testing/first_one/pytorch_model.bin Running Evaluation Num examples = 480 Batch size = 10 100% 48/48 [00:30<00:00, 1.58it/s] metrics_value eval_loss 0.429582 eval_accuracy 0.797917 eval_precision 0.775701 eval_recall 0.772093 eval_f1 0.773893 eval_runtime 31.039800 eval_samples_per_second 15.464000 eval_steps_per_second 1.546000 epoch 3.000000