flairNLP / flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)
https://flairnlp.github.io/flair/
Other
13.94k stars 2.11k forks source link

AMP O2 #1421

Closed djstrong closed 4 years ago

djstrong commented 4 years ago

Describe the bug

Bug while training sequence tagger with amp_opt_level=O2.

To Reproduce

Version: 0.4.5 GPU: V100

Run https://github.com/flairNLP/flair/blob/master/train.py with added:

    use_amp=True,
    amp_opt_level='O2'
2020-02-08 12:14:02,758 ----------------------------------------------------------------------------------------------------
2020-02-08 12:14:02,758 Corpus: "Corpus: 12543 train + 2002 dev + 2077 test sentences"
2020-02-08 12:14:02,758 ----------------------------------------------------------------------------------------------------
2020-02-08 12:14:02,759 Parameters:
2020-02-08 12:14:02,759  - learning_rate: "0.1"
2020-02-08 12:14:02,759  - mini_batch_size: "32"
2020-02-08 12:14:02,759  - patience: "3"
2020-02-08 12:14:02,759  - anneal_factor: "0.5"
2020-02-08 12:14:02,759  - max_epochs: "20"
2020-02-08 12:14:02,759  - shuffle: "False"
2020-02-08 12:14:02,759  - train_with_dev: "False"
2020-02-08 12:14:02,759  - batch_growth_annealing: "False"
2020-02-08 12:14:02,759 ----------------------------------------------------------------------------------------------------
2020-02-08 12:14:02,759 Model training base path: "resources/taggers/example-ner"
2020-02-08 12:14:02,759 ----------------------------------------------------------------------------------------------------
2020-02-08 12:14:02,760 Device: cuda:0
2020-02-08 12:14:02,760 ----------------------------------------------------------------------------------------------------
2020-02-08 12:14:02,760 Embeddings storage mode: cpu
Selected optimization level O2:  FP16 training with FP32 batchnorm and FP32 master weights.

Defaults for this optimization level are:
enabled                : True
opt_level              : O2
cast_model_type        : torch.float16
patch_torch_functions  : False
keep_batchnorm_fp32    : True
master_weights         : True
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O2
cast_model_type        : torch.float16
patch_torch_functions  : False
keep_batchnorm_fp32    : True
master_weights         : True
loss_scale             : dynamic
2020-02-08 12:14:02,764 ----------------------------------------------------------------------------------------------------
Traceback (most recent call last):
  File "<stdin>", line 8, in <module>
  File "venv/lib/python3.7/site-packages/flair/trainers/trainer.py", line 331, in train
    loss = self.model.forward_loss(batch_step)
  File "venv/lib/python3.7/site-packages/flair/models/sequence_tagger_model.py", line 493, in forward_loss
    features = self.forward(data_points)
  File "venv/lib/python3.7/site-packages/apex/amp/_initialize.py", line 197, in new_fwd
    **applier(kwargs, input_caster))
  File "venv/lib/python3.7/site-packages/flair/models/sequence_tagger_model.py", line 498, in forward
    self.embeddings.embed(sentences)
  File "venv/lib/python3.7/site-packages/flair/embeddings.py", line 177, in embed
    embedding.embed(sentences)
  File "venv/lib/python3.7/site-packages/flair/embeddings.py", line 87, in embed
    for token in sentence.tokens:
AttributeError: 'NoneType' object has no attribute 'tokens'

Additional context The error is shown with embeddings storage mode cpu and gpu too. It works with O1 (however training is a little slower).

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

alanrios2001 commented 1 year ago

same problem here