Closed samehraban closed 1 year ago
Thanks for the report, that does sound like a bug. I don't think a model with NaN loss values can be useful but serialization shouldn't fail. Can you give us the full stack trace you get just so we can be sure where it's happening?
Also, do you know why your loss values are NaN? That shouldn't happen and could be another issue. Are you getting any warnings during training?
On further investigation it seems like the standard Python json
module uses NaN
and Infinity
in serializing floats, but strictly speaking that's not in the JSON spec and so ujson
doesn't implement it, which is the root of the issue here. If we wanted to support this we could probably add some kind of encoder/decoder in srsly.
I'm having trouble finding recent opinions on this in ujson upstream, but it looks like it is still not supported. On the other hand, numpy's vendored ujson seems to have added support similar to the builtin json
module.
Sorry for the delay. Here is the full spacy output:
ℹ Saving to output directory: storage/model
ℹ Using GPU: 0
=========================== Initializing pipeline ===========================
[2022-02-05 22:38:53,492] [INFO] Set up nlp object from config
[2022-02-05 22:38:53,500] [INFO] Pipeline: ['tok2vec', 'textcat_multilabel']
[2022-02-05 22:38:53,503] [INFO] Created vocabulary
[2022-02-05 22:38:53,504] [INFO] Finished initializing nlp object
[2022-02-05 22:39:19,230] [INFO] Initialized pipeline components: ['tok2vec', 'textcat_multilabel']
✔ Initialized pipeline
============================= Training pipeline =============================
ℹ Pipeline: ['tok2vec', 'textcat_multilabel']
ℹ Initial learn rate: 0.001
E # LOSS TOK2VEC LOSS TEXTC... CATS_SCORE SCORE
--- ------ ------------ ------------- ---------- ------
0 0 0.01 0.05 49.84 0.50
Epoch 1: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 998/1000 [00:46<00:00, 19.79it/s$
âš Aborting and saving the final best model. Encountered exception:
OverflowError('Invalid Nan value when encoding double',)
Traceback (most recent call last):
File "/home/sam/.virtualenvs/spacy/lib/python3.6/site-packages/spacy/training/loop.py", line 122, in train
raise e
File "/home/sam/.virtualenvs/spacy/lib/python3.6/site-packages/spacy/training/loop.py", line 110, in train
save_checkpoint(is_best_checkpoint)
File "/home/sam/.virtualenvs/spacy/lib/python3.6/site-packages/spacy/training/loop.py", line 67, in save_checkpoint
before_to_disk(nlp).to_disk(output_path / DIR_MODEL_LAST)
File "/home/sam/.virtualenvs/spacy/lib/python3.6/site-packages/spacy/language.py", line 1985, in to_disk
util.to_disk(path, serializers, exclude)
File "/home/sam/.virtualenvs/spacy/lib/python3.6/site-packages/spacy/util.py", line 1287, in to_disk
writer(path / key)
File "/home/sam/.virtualenvs/spacy/lib/python3.6/site-packages/spacy/language.py", line 1976, in <lambda>
serializers["meta.json"] = lambda p: srsly.write_json(p, self.meta)
File "/home/sam/.virtualenvs/spacy/lib/python3.6/site-packages/srsly/_json_api.py", line 75, in write_json
json_data = json_dumps(data, indent=indent)
File "/home/sam/.virtualenvs/spacy/lib/python3.6/site-packages/srsly/_json_api.py", line 26, in json_dumps
result = ujson.dumps(data, indent=indent, escape_forward_slashes=False)
OverflowError: Invalid Nan value when encoding double
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/sam/.virtualenvs/spacy/bin/spacy", line 8, in <module>
sys.exit(setup_cli())
File "/home/sam/.virtualenvs/spacy/lib/python3.6/site-packages/spacy/cli/_util.py", line 71, in setup_cli
command(prog_name=COMMAND)
File "/home/sam/.virtualenvs/spacy/lib/python3.6/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/home/sam/.virtualenvs/spacy/lib/python3.6/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/home/sam/.virtualenvs/spacy/lib/python3.6/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/sam/.virtualenvs/spacy/lib/python3.6/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/sam/.virtualenvs/spacy/lib/python3.6/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/home/sam/.virtualenvs/spacy/lib/python3.6/site-packages/typer/main.py", line 500, in wrapper
return callback(**use_params) # type: ignore
File "/home/sam/.virtualenvs/spacy/lib/python3.6/site-packages/spacy/cli/train.py", line 45, in train_cli
train(config_path, output_path, use_gpu=use_gpu, overrides=overrides)
File "/home/sam/.virtualenvs/spacy/lib/python3.6/site-packages/spacy/cli/train.py", line 75, in train
train_nlp(nlp, output_path, use_gpu=use_gpu, stdout=sys.stdout, stderr=sys.stderr)
File "/home/sam/.virtualenvs/spacy/lib/python3.6/site-packages/spacy/training/loop.py", line 126, in train
save_checkpoint(False)
File "/home/sam/.virtualenvs/spacy/lib/python3.6/site-packages/spacy/training/loop.py", line 67, in save_checkpoint
before_to_disk(nlp).to_disk(output_path / DIR_MODEL_LAST)
File "/home/sam/.virtualenvs/spacy/lib/python3.6/site-packages/spacy/language.py", line 1985, in to_disk
util.to_disk(path, serializers, exclude)
File "/home/sam/.virtualenvs/spacy/lib/python3.6/site-packages/spacy/util.py", line 1287, in to_disk
writer(path / key)
File "/home/sam/.virtualenvs/spacy/lib/python3.6/site-packages/spacy/language.py", line 1976, in <lambda>
serializers["meta.json"] = lambda p: srsly.write_json(p, self.meta)
File "/home/sam/.virtualenvs/spacy/lib/python3.6/site-packages/srsly/_json_api.py", line 75, in write_json
json_data = json_dumps(data, indent=indent)
File "/home/sam/.virtualenvs/spacy/lib/python3.6/site-packages/srsly/_json_api.py", line 26, in json_dumps
result = ujson.dumps(data, indent=indent, escape_forward_slashes=False)
OverflowError: Invalid Nan value when encoding double
And about NaN values for loss, I had True
/False
in doc cats instead of 1
/0
.
Thanks for the extra details.
We took a look at what's involved in adding support for NaN to JSON serialization and it's actually quite involved. Given that in this case the only thing it would accomplish is serializing an unusable model, for now we're going to avoid making changes to srsly
. If there is some valid usecase where this is important we can revisit this later though.
Thanks again for reporting the issue.
How about checking the label format and raise an error in case of invalid format?
Can you pinpoint which value in nlp.meta
is NaN
by printing/inspecting it right before the error? It would be right before this line:
Yeah I saw that and got it to work before I realize I had a bug in converting my data to spacy format.
============================= Training pipeline =============================
ℹ Pipeline: ['tok2vec', 'textcat_multilabel']
ℹ Initial learn rate: 0.001
E # LOSS TOK2VEC LOSS TEXTC... CATS_SCORE SCORE
--- ------ ------------ ------------- ---------- ------
0 0 0.01 0.05 49.84 0.50
0 200 nan nan 50.64 0.51
0 400 nan nan 50.64 0.51
as you can see tok2vec loss and textcat loss were both NaN
.
Thanks for the info! I wasn't 100% sure from the original report that it was the loss and not maybe another bug in the scorer or elsewhere that we might need to take a look at. I've never seen the loss as NaN before...
Sorry for not following up on this earlier, but it should be addressed by #11763. Thanks again for reporting it!
This issue has been automatically closed because it was answered and there was no follow-up discussion.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
How to reproduce the behaviour
I'm trying to train a text classifier and at the first try I always got
OverflowError: Invalid Nan value when encoding double
. Turns out loss values where NaN andsrsly
ujson module raises an error for such case.Your Environment
Info about spaCy