Ruuning into an error while training the model through colab

masoudh175 commented 4 years ago

Hi,

I was trying to test the Colab for training the model over wiki_test_103, but I am running into the below error. I thought maybe it is because of the recent updates in the allennlp package. I checked the version you use in setup.py and it is "allennlp>=1.0.0".

Is there any specific allennlp version that you guys use for training? Or this error is because of something else?

Tahnks!

2020-08-21 19:26:28,599 - CRITICAL - root - Uncaught exception Traceback (most recent call last): File "/usr/local/bin/allennlp", line 8, in <module> sys.exit(run()) File "/usr/local/lib/python3.6/dist-packages/allennlp/__main__.py", line 34, in run main(prog="allennlp") File "/usr/local/lib/python3.6/dist-packages/allennlp/commands/__init__.py", line 92, in main args.func(args) File "/usr/local/lib/python3.6/dist-packages/allennlp/commands/train.py", line 118, in train_model_from_args file_friendly_logging=args.file_friendly_logging, File "/usr/local/lib/python3.6/dist-packages/allennlp/commands/train.py", line 177, in train_model_from_file file_friendly_logging=file_friendly_logging, File "/usr/local/lib/python3.6/dist-packages/allennlp/commands/train.py", line 238, in train_model file_friendly_logging=file_friendly_logging, File "/usr/local/lib/python3.6/dist-packages/allennlp/commands/train.py", line 429, in _train_worker params=params, serialization_dir=serialization_dir, local_rank=process_rank, File "/usr/local/lib/python3.6/dist-packages/allennlp/common/from_params.py", line 581, in from_params **extras, File "/usr/local/lib/python3.6/dist-packages/allennlp/common/from_params.py", line 612, in from_params return constructor_to_call(**kwargs) # type: ignore File "/usr/local/lib/python3.6/dist-packages/allennlp/commands/train.py", line 683, in from_partial_objects model=model_, data_loader=data_loader_, validation_data_loader=validation_data_loader_, File "/usr/local/lib/python3.6/dist-packages/allennlp/common/lazy.py", line 46, in construct return self._constructor(**kwargs) File "/usr/local/lib/python3.6/dist-packages/allennlp/common/from_params.py", line 447, in constructor return value_cls.from_params(params=deepcopy(popped_params), **constructor_extras) File "/usr/local/lib/python3.6/dist-packages/allennlp/common/from_params.py", line 581, in from_params **extras, File "/usr/local/lib/python3.6/dist-packages/allennlp/common/from_params.py", line 610, in from_params kwargs = create_kwargs(constructor_to_inspect, cls, params, **extras) File "/usr/local/lib/python3.6/dist-packages/allennlp/common/from_params.py", line 193, in create_kwargs params.assert_empty(cls.__name__) File "/usr/local/lib/python3.6/dist-packages/allennlp/common/params.py", line 429, in assert_empty "Extra parameters passed to {}: {}".format(class_name, self.params) allennlp.common.checks.ConfigurationError: Extra parameters passed to GradientDescentTrainer: {'opt_level': None}

JohnGiorgi commented 4 years ago

Whoops, looks like AllenNLP removed opt_level and replaced it with use_amp. I made that switch everywhere in #145 and confirmed the notebook now runs end-to-end.

Will you try again and let me know if your issue is solved? Thanks for pointing this out.

Side note, training on Colab is very slow. I will see if I can find ways to speed it up.

masoudh175 commented 4 years ago

Thanks for the quick repose John! I confirm that the issue is resolved. But as you said it is running very slow. I haven't read the paper throughly so I am not sure what systems you used to train the models. Is there any recommendations about the GPU/Memory? Also, is there anyway to see the remaining time while the model is being trained?

Thanks! Masoud

JohnGiorgi commented 4 years ago

Is there any recommendations about the GPU/Memory?

Training the model is approximately as expensive as pre-training the underlying language models (DistilRoBERTa or RoBERTa for DeCLUTR-small and DeCLUTR-base respectively). In the paper, we use up to 4 V100 32GB GPUs to train the models in a few hours on our dataset of ~500K documents.

Also, is there anyway to see the remaining time while the model is being trained?

This is something that used to happen but doesn't on the latest version of AllenNLP. I will look into it (#146).

More generally, is there a reason you would like to train your own model instead of using one of our pretrained models? Training is pretty expensive, and probably not something you want to do in Colab. If you have a certain goal in mind, maybe we can train the model for you and send you the weights.

masoudh175 commented 4 years ago

Thanks for the information! Most of the pre-trained models are trained over generic english and they don't a good job on tech related sentences. Our simple fasttext model trained over all of the stack overflow questions titles+answers does a better job than any pre-trained models. I wanted to give a try to DeCLUTR and train it over stack overflow questions titles+answers (excluding source code) and compare.

JohnGiorgi commented 4 years ago

I see, thanks! How big is this corpus of SO question titles + answers? It would be helpful to know the # of examples and the average word length of each example.

I see two paths forward,

Following the training instructions in the readme and the colab, train the model on your dataset outside of colab (i.e. somewhere with more compute than the free colab provides). I will have a better idea of how much computation you need based on how big your corpus is.
Again, depending on how big your corpus is, I may be able to train the model for you if you are willing to share the dataset with me. I can then send you the model weights so you can load it and evaluate it.

Hope that helps.

masoudh175 commented 4 years ago

Thanks a lot for the offer. I'll contact you through an email.

JohnGiorgi / DeCLUTR

Ruuning into an error while training the model through colab #144