ThilinaRajapakse / simpletransformers

Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI
https://simpletransformers.ai/
Apache License 2.0
4.11k stars 727 forks source link

evaluate_during_training parameter in Classification Model makes .train and .predict unusable and infinitely stuck #1575

Closed Andrian0s closed 4 months ago

Andrian0s commented 4 months ago

Describe the bug When evaluating during training using the ClassificationModel. model during .train gets stuck (the bar remains empty) during the validation set predictions, with no progress (even after allowing it to take x10 times longer than needed). Essentially, makes evaluation during training unusable.

If you manage to get the trained model (through let's say training without evaluation during training), The same issue happens in model.predict, if the dataset passed for prediction is larger than the model's batch size. Workaround: Prebatch the dataset (to 16 in my case) and then run .predict

To Reproduce Following code (on google colab) reproduces this error.

~ New Colab Cell !pip install simpletransformers

~ New Colab Cell from simpletransformers.classification import ClassificationModel

model_args = { "output_dir": "outputs/", "cache_dir": "cache_dir/", "max_seq_length": 256, "train_batch_size": 16, "eval_batch_size": 16, "num_train_epochs": 1, "evaluate_during_training": True, "use_cuda": True, # Make sure CUDA is available "overwrite_output_dir": True, "reprocess_input_data": True, "save_model_every_epoch": True, "save_steps": -1, "no_cache": True, "save_optimizer_and_scheduler": True, "silent": False, "use_early_stopping": True, "early_stopping_patience": 3, "early_stopping_threshold": 0.01, "early_stopping_metric": "mcc", "early_stopping_metric_minimize": False, }

model = ClassificationModel( "xlmroberta", "xlm-roberta-base", args=model_args )

~New Colab Cell

model.train_model(train_df, eval_df=valid_df) ~ here it stays forever Expected behavior Training the model should work with the evaluate_during_training parameter on, saving intermediate evaluates in the directory and the training not staying stuck.

Desktop (please complete the following information):

Additional context I think I stumbled upon this issue ~1.5 year ago and did not realise this was the issue, so the bug might not be related to recent releases.

ThilinaRajapakse commented 4 months ago

This is most likely related to multiprocessing (doesn't play nice on all systems). Try setting both use_multiprocessing and use_multiprocessing_for_evaluation to False. I think I'll disable these by default in a future release since Huggingface tokenizers are now fast enough that it's probably not required in most cases.

Andrian0s commented 4 months ago

That does fix it indeed for my usecase. Thank you