allenai / scibert

A BERT model for scientific text.
https://arxiv.org/abs/1903.10676
Apache License 2.0
1.47k stars 214 forks source link

ValueError in parsing using genia #118

Open blackbirt-5 opened 3 years ago

blackbirt-5 commented 3 years ago

Hi,

I am interested in the parsing field of nlp and am studying sciBERT. In parsing using genia, ValueError was raised as shown in the attached file. How can I solve this?

Many thanks!

ps) Other TASKs run fine.

2021-07-01 12:51:39,366 - INFO - allennlp.training.trainer - Training 0%| | 1/896 [00:04<1:00:43, 4.07s/it] Traceback (most recent call last): File "/home/centos/miniconda3/envs/test_38/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/centos/miniconda3/envs/test_38/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/centos/ML/src/allennlp/allennlp/run.py", line 21, in run() File "/home/centos/ML/src/allennlp/allennlp/run.py", line 18, in run main(prog="allennlp") File "/home/centos/ML/src/allennlp/allennlp/commands/init.py", line 102, in main args.func(args) File "/home/centos/ML/src/allennlp/allennlp/commands/train.py", line 117, in train_model_from_args train_model_from_file(args.param_path, File "/home/centos/ML/src/allennlp/allennlp/commands/train.py", line 163, in train_model_from_file return train_model(params, File "/home/centos/ML/src/allennlp/allennlp/commands/train.py", line 252, in train_model metrics = trainer.train() File "/home/centos/ML/src/allennlp/allennlp/training/trainer.py", line 539, in train train_metrics = self._train_epoch(epoch) File "/home/centos/ML/src/allennlp/allennlp/training/trainer.py", line 372, in _train_epoch raise ValueError("nan loss encountered") ValueError: nan loss encountered

error

fan-hd commented 2 years ago

Hi,

I am interested in the parsing field of nlp and am studying sciBERT. In parsing using genia, ValueError was raised as shown in the attached file. How can I solve this?

Many thanks!

ps) Other TASKs run fine.

2021-07-01 12:51:39,366 - INFO - allennlp.training.trainer - Training 0%| | 1/896 [00:04<1:00:43, 4.07s/it] Traceback (most recent call last): File "/home/centos/miniconda3/envs/test_38/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/centos/miniconda3/envs/test_38/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/centos/ML/src/allennlp/allennlp/run.py", line 21, in run() File "/home/centos/ML/src/allennlp/allennlp/run.py", line 18, in run main(prog="allennlp") File "/home/centos/ML/src/allennlp/allennlp/commands/init.py", line 102, in main args.func(args) File "/home/centos/ML/src/allennlp/allennlp/commands/train.py", line 117, in train_model_from_args train_model_from_file(args.param_path, File "/home/centos/ML/src/allennlp/allennlp/commands/train.py", line 163, in train_model_from_file return train_model(params, File "/home/centos/ML/src/allennlp/allennlp/commands/train.py", line 252, in train_model metrics = trainer.train() File "/home/centos/ML/src/allennlp/allennlp/training/trainer.py", line 539, in train train_metrics = self._train_epoch(epoch) File "/home/centos/ML/src/allennlp/allennlp/training/trainer.py", line 372, in _train_epoch raise ValueError("nan loss encountered") ValueError: nan loss encountered

error

This occurs me too. For me, this ERROR is caused by the implementation of masked_log_softmax defined in allennlp.nn. The eps=1e-45 is still too small to avoid nans. You can use a bigger eps; eps=1e-37 does work on my machine. Or use an implementation from higher version of allennlp