ThilinaRajapakse / simpletransformers

Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI
https://simpletransformers.ai/
Apache License 2.0
4.11k stars 727 forks source link

LanguageModelingModel stops abruptly with no error or warning message #1151

Closed sprajagopal closed 3 years ago

sprajagopal commented 3 years ago

Describe the bug The model starts to train but exits abruptly. If this is a memory issue, there is no indication.

To Reproduce Steps to reproduce the behavior:

from simpletransformers.language_modeling import LanguageModelingModel
import logging

logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)

train_args = {
    "reprocess_input_data": False,
    "overwrite_output_dir": True,
    "num_train_epochs": 3,
    "save_eval_checkpoints": True,
    "save_model_every_epoch": False,
    "learning_rate": 5e-4,
    "warmup_steps": 10000,
    "train_batch_size": 64,
    "eval_batch_size": 128,
    "gradient_accumulation_steps": 1,
    "block_size": 128,
    "max_seq_length": 128,
    "dataset_type": "simple",
    "wandb_project": "Esperanto - ELECTRA",
    "wandb_kwargs": {"name": "Electra-SMALL"},
    "logging_steps": 100,
    "evaluate_during_training": True,
    "evaluate_during_training_steps": 50000,
    "evaluate_during_training_verbose": True,
    "use_cached_eval_features": True,
    "sliding_window": True,
    "vocab_size": 52000,
    "generator_config": {
        "embedding_size": 128,
        "hidden_size": 256,
        "num_hidden_layers": 3,
    },
    "discriminator_config": {
        "embedding_size": 128,
        "hidden_size": 256,
    },
}

train_file = "data/small/train.txt"
test_file = "data/small/test.txt"

model = LanguageModelingModel(
    "electra",
    None,
    args=train_args,
    train_files=train_file,
)

Expected behavior Was hoping that the model would actually train

Actual output

[00:00:38] Pre-processing files (1416 Mo)           ████████████████████████████████████████                100%
[00:00:00] Tokenize words                           ████████████████████████████████████████ 601011   /   601011
[00:00:00] Count pairs                              ████████████████████████████████████████ 601011   /   601011
[00:00:04] Compute merges                           ████████████████████████████████████████ 51324    /    51324

INFO:simpletransformers.language_modeling.language_modeling_model: Training of None tokenizer complete. Saved to
 outputs/.
INFO:simpletransformers.language_modeling.language_modeling_model: Training language model from scratch

Desktop (please complete the following information):

Additional context The training and test files are 1.3GB and 130MB respectively.

Debugging efforts Stepping through the code leads to this:

--Call--
Exception ignored in: <function WeakValueDictionary.__init__.<locals>.remove at 0x7f771552dee0>
Traceback (most recent call last):
  File "/home/reverie-pc/.pyenv/versions/3.8.0/lib/python3.8/weakref.py", line 103, in remove
  File "/home/reverie-pc/.pyenv/versions/3.8.0/lib/python3.8/bdb.py", line 90, in trace_dispatch
  File "/home/reverie-pc/.pyenv/versions/3.8.0/lib/python3.8/bdb.py", line 134, in dispatch_call
  File "/home/reverie-pc/.pyenv/versions/3.8.0/lib/python3.8/pdb.py", line 250, in user_call
  File "/home/reverie-pc/.pyenv/versions/3.8.0/lib/python3.8/pdb.py", line 354, in interaction
  File "/home/reverie-pc/.pyenv/versions/3.8.0/lib/python3.8/pdb.py", line 1458, in print_stack_entry
  File "/home/reverie-pc/.pyenv/versions/3.8.0/lib/python3.8/bdb.py", line 543, in format_stack_entry
ImportError: sys.meta_path is None, Python is likely shutting down
stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.