Closed Andrian0s closed 4 months ago
This is most likely related to multiprocessing (doesn't play nice on all systems). Try setting both use_multiprocessing
and use_multiprocessing_for_evaluation
to False
. I think I'll disable these by default in a future release since Huggingface tokenizers are now fast enough that it's probably not required in most cases.
That does fix it indeed for my usecase. Thank you
Describe the bug When evaluating during training using the ClassificationModel. model during .train gets stuck (the bar remains empty) during the validation set predictions, with no progress (even after allowing it to take x10 times longer than needed). Essentially, makes evaluation during training unusable.
If you manage to get the trained model (through let's say training without evaluation during training), The same issue happens in model.predict, if the dataset passed for prediction is larger than the model's batch size. Workaround: Prebatch the dataset (to 16 in my case) and then run .predict
To Reproduce Following code (on google colab) reproduces this error.
~ New Colab Cell !pip install simpletransformers
~ New Colab Cell from simpletransformers.classification import ClassificationModel
model_args = { "output_dir": "outputs/", "cache_dir": "cache_dir/", "max_seq_length": 256, "train_batch_size": 16, "eval_batch_size": 16, "num_train_epochs": 1, "evaluate_during_training": True, "use_cuda": True, # Make sure CUDA is available "overwrite_output_dir": True, "reprocess_input_data": True, "save_model_every_epoch": True, "save_steps": -1, "no_cache": True, "save_optimizer_and_scheduler": True, "silent": False, "use_early_stopping": True, "early_stopping_patience": 3, "early_stopping_threshold": 0.01, "early_stopping_metric": "mcc", "early_stopping_metric_minimize": False, }
model = ClassificationModel( "xlmroberta", "xlm-roberta-base", args=model_args )
~New Colab Cell
model.train_model(train_df, eval_df=valid_df) ~ here it stays forever Expected behavior Training the model should work with the evaluate_during_training parameter on, saving intermediate evaluates in the directory and the training not staying stuck.
Desktop (please complete the following information):
Additional context I think I stumbled upon this issue ~1.5 year ago and did not realise this was the issue, so the bug might not be related to recent releases.