explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.69k stars 4.36k forks source link

Cannot get dimension 'nO' for model 'transformer-listener': value unset #10939

Closed Vfgandara closed 2 years ago

Vfgandara commented 2 years ago

How to reproduce the behaviour

Hi there,

I'm trying to train two BERT models in Portuguese with a NER head. The BERT models are BERTimbau and biobertpt-all. To do the training I'm creating a docker container with the following Dockerfile:

# image for GPU training 
FROM nvidia/cuda:10.1-cudnn7-runtime

ENV DEBIAN_FRONTEND noninteractive

# setup some important packages
RUN apt update && \
    apt install --no-install-recommends -y build-essential software-properties-common && \
    add-apt-repository -y ppa:deadsnakes/ppa && \
    apt install --no-install-recommends -y python3.8-dev python3-pip python3-setuptools python3-distutils 

# python installation
RUN python3.8 -m pip install --upgrade pip 

# install all dependancies
RUN python3.8 -m pip install -U typer && \
    python3.8 -m pip install -U pip setuptools wheel && \
    python3.8 -m pip install -U spacy && \
    python3.8 -m pip install -U 'spacy[cuda101]' && \
    python3.8 -m pip install -U spacy-transformers
# copying relevant folders 
COPY ./data .

Inside data there are the train and dev .spacy files for training and the .cfg bellow:

[paths]
train = null
dev = null
vectors = null
init_tok2vec = null

[system]
gpu_allocator = "pytorch"
seed = 0

[nlp]
lang = "pt"
pipeline = ["transformer","ner"]
batch_size = 1
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}

[components]

[components.ner]
factory = "ner"
incorrect_spans_key = null
moves = null
scorer = {"@scorers":"spacy.ner_scorer.v1"}
update_with_oracle_cut_size = 100

[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = false
nO = null

[components.ner.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
pooling = {"@layers":"reduce_mean.v1"}
upstream = "*"

[components.transformer]
factory = "transformer"
max_batch_items = 2048 
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}

[components.transformer.model]
@architectures = "spacy-transformers.TransformerModel.v3"
name = "pucpr/biobertpt-all"
mixed_precision = false

[components.transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
window = 128
stride = 96

[components.transformer.model.grad_scaler_config]

[components.transformer.model.tokenizer_config]
use_fast = true

[components.transformer.model.transformer_config]

[corpora]

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[training]
accumulate_gradient = 3
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
patience = 1600
max_epochs = 0
max_steps = 20000
eval_frequency = 200
frozen_components = ["transformer"]
annotating_components = []
before_to_disk = null

[training.batcher]
@batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
size = 500
buffer = 1
get_length = null

[training.logger]
@loggers = "spacy.ConsoleLogger.v1"
progress_bar = false

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001

[training.optimizer.learn_rate]
@schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 20000
initial_rate = 0.00005

[training.score_weights]
ents_f = 1.0
ents_p = 0.0
ents_r = 0.0
ents_per_type = null

[pretraining]

[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null

[initialize.components]

[initialize.tokenizer]

It's dentrimental to the project I' working on that the transformer pipe is frozen. But when I run python3.8 -m spacy train a.cfg -o foo -g 0 --paths.train train.spacy --paths.dev test.spacy inside of the container I get the following error:

✔ Created output directory: foo
ℹ Saving to output directory: foo
ℹ Using GPU: 0

=========================== Initializing pipeline ===========================
[2022-06-09 12:14:04,685] [INFO] Set up nlp object from config
[2022-06-09 12:14:04,693] [INFO] Pipeline: ['transformer', 'ner']
[2022-06-09 12:14:04,696] [INFO] Created vocabulary
[2022-06-09 12:14:04,696] [INFO] Finished initializing nlp object
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.8/dist-packages/spacy/__main__.py", line 4, in <module>
    setup_cli()
  File "/usr/local/lib/python3.8/dist-packages/spacy/cli/_util.py", line 71, in setup_cli
    command(prog_name=COMMAND)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/typer/main.py", line 500, in wrapper
    return callback(**use_params)  # type: ignore
  File "/usr/local/lib/python3.8/dist-packages/spacy/cli/train.py", line 45, in train_cli
    train(config_path, output_path, use_gpu=use_gpu, overrides=overrides)
  File "/usr/local/lib/python3.8/dist-packages/spacy/cli/train.py", line 72, in train
    nlp = init_nlp(config, use_gpu=use_gpu)
  File "/usr/local/lib/python3.8/dist-packages/spacy/training/initialize.py", line 84, in init_nlp
    nlp.initialize(lambda: train_corpus(nlp), sgd=optimizer)
  File "/usr/local/lib/python3.8/dist-packages/spacy/language.py", line 1317, in initialize
    proc.initialize(get_examples, nlp=self, **p_settings)
  File "spacy/pipeline/transition_parser.pyx", line 568, in spacy.pipeline.transition_parser.Parser.initialize
  File "/usr/local/lib/python3.8/dist-packages/thinc/model.py", line 299, in initialize
    self.init(self, X=X, Y=Y)
  File "/usr/local/lib/python3.8/dist-packages/spacy/ml/tb_framework.py", line 45, in init
    model.get_ref("tok2vec").initialize(X=X)
  File "/usr/local/lib/python3.8/dist-packages/thinc/model.py", line 299, in initialize
    self.init(self, X=X, Y=Y)
  File "/usr/local/lib/python3.8/dist-packages/thinc/layers/chain.py", line 85, in init
    layer.initialize(X=curr_input, Y=Y)
  File "/usr/local/lib/python3.8/dist-packages/thinc/model.py", line 299, in initialize
    self.init(self, X=X, Y=Y)
  File "/usr/local/lib/python3.8/dist-packages/thinc/layers/chain.py", line 89, in init
    curr_input = layer.predict(curr_input)
  File "/usr/local/lib/python3.8/dist-packages/thinc/model.py", line 315, in predict
    return self._func(self, X, is_train=False)[0]
  File "/usr/local/lib/python3.8/dist-packages/spacy_transformers/layers/listener.py", line 64, in forward
    width = model.get_dim("nO")
  File "/usr/local/lib/python3.8/dist-packages/thinc/model.py", line 175, in get_dim
    raise ValueError(err)
ValueError: Cannot get dimension 'nO' for model 'transformer-listener': value unset

If I don't freeze the transformer component things run as usual, no problem. Changing from GPU to CPU also didn't solve the error.

I didn't find much or anything that helpful online so I'm asking here whether there's something that could be done to solve this. Thanks in advance!

Your Environment

polm commented 2 years ago

There's not a good way to freeze Transformers at the moment. You can set grad_factor = 0 to disable weight updates, but computation will still be performed. We are working on improving this.

Vfgandara commented 2 years ago

I'll try the grad_factor = 0 then! Thanks for the help :)

github-actions[bot] commented 2 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.