explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.69k stars 4.36k forks source link

Biaffiane parser not works with toc2vec backend #10527

Closed zsozso21 closed 2 years ago

zsozso21 commented 2 years ago

I applied the experimental Biaffine parser based on this example and it works well when I use the transformer based architecture, but I got the following error when I tried to apply it with a toc2vec model by using cpu:

Traceback (most recent call last):
  File "/home/a100/zsozso/deploy/.venv/bin/spacy", line 8, in <module>
    sys.exit(setup_cli())
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/spacy/cli/_util.py", line 71, in setup_cli
    command(prog_name=COMMAND)
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/typer/main.py", line 500, in wrapper
    return callback(**use_params)  # type: ignore
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/spacy/cli/train.py", line 45, in train_cli
    train(config_path, output_path, use_gpu=use_gpu, overrides=overrides)
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/spacy/cli/train.py", line 75, in train
    train_nlp(nlp, output_path, use_gpu=use_gpu, stdout=sys.stdout, stderr=sys.stderr)
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/spacy/training/loop.py", line 122, in train
    raise e
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/spacy/training/loop.py", line 105, in train
    for batch, info, is_best_checkpoint in training_step_iterator:
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/spacy/training/loop.py", line 203, in train_while_improving
    nlp.update(
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/spacy/language.py", line 1156, in update
    proc.update(examples, sgd=None, losses=losses, **component_cfg[name])  # type: ignore
  File "spacy_experimental/biaffine_parser/arc_predicter.pyx", line 220, in spacy_experimental.biaffine_parser.arc_predicter.ArcPredicter.update
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/thinc/layers/chain.py", line 60, in backprop
    dX = callback(dY)
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/thinc/layers/pytorchwrapper.py", line 139, in backprop
    dXtorch = torch_backprop(dYtorch)
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/thinc/shims/pytorch.py", line 105, in backprop
    grads.kwargs["grad_tensors"] = self._grad_scaler.scale(
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/thinc/shims/pytorch_grad_scaler.py", line 97, in scale
    self._scale_tensor(tensor, scale_per_device, inplace)
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/thinc/shims/pytorch_grad_scaler.py", line 110, in _scale_tensor
    assert tensor.is_cuda, "Gradient scaling is only supported for CUDA tensors"
AssertionError: Gradient scaling is only supported for CUDA tensors

And I got this error with GPU:

Traceback (most recent call last):
  File "/home/a100/zsozso/deploy/.venv/bin/spacy", line 8, in <module>
    sys.exit(setup_cli())
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/spacy/cli/_util.py", line 71, in setup_cli
    command(prog_name=COMMAND)
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/typer/main.py", line 500, in wrapper
    return callback(**use_params)  # type: ignore
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/spacy/cli/train.py", line 45, in train_cli
    train(config_path, output_path, use_gpu=use_gpu, overrides=overrides)
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/spacy/cli/train.py", line 72, in train
    nlp = init_nlp(config, use_gpu=use_gpu)
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/spacy/training/initialize.py", line 84, in init_nlp
    nlp.initialize(lambda: train_corpus(nlp), sgd=optimizer)
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/spacy/language.py", line 1286, in initialize
    init_vocab(
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/spacy/training/initialize.py", line 131, in init_vocab
    load_vectors_into_model(nlp, vectors)
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/spacy/training/initialize.py", line 152, in load_vectors_into_model
    vectors_nlp = load_model(name, vocab=nlp.vocab, exclude=exclude)
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/spacy/util.py", line 422, in load_model
    return load_model_from_path(Path(name), **kwargs)  # type: ignore[arg-type]
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/spacy/util.py", line 489, in load_model_from_path
    return nlp.from_disk(model_path, exclude=exclude, overrides=overrides)
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/spacy/language.py", line 2042, in from_disk
    util.from_disk(path, deserializers, exclude)  # type: ignore[arg-type]
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/spacy/util.py", line 1299, in from_disk
    reader(path / key)
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/spacy/language.py", line 2018, in deserialize_vocab
    self.vocab.from_disk(path, exclude=exclude)
  File "spacy/vocab.pyx", line 460, in spacy.vocab.Vocab.from_disk
  File "spacy/vectors.pyx", line 616, in spacy.vectors.Vectors.from_disk
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/spacy/util.py", line 1299, in from_disk
    reader(path / key)
  File "spacy/vectors.pyx", line 602, in spacy.vectors.Vectors.from_disk.load_vectors
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/cupy/_io/npz.py", line 71, in load
    return cupy.array(obj)
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/cupy/_creation/from_data.py", line 41, in array
    return _core.array(obj, dtype, copy, order, subok, ndmin)
  File "cupy/_core/core.pyx", line 2165, in cupy._core.core.array
  File "cupy/_core/core.pyx", line 2244, in cupy._core.core.array
  File "cupy/_core/core.pyx", line 2318, in cupy._core.core._send_object_to_gpu
  File "cupy/_core/core.pyx", line 167, in cupy._core.core.ndarray.__init__
  File "cupy/cuda/memory.pyx", line 718, in cupy.cuda.memory.alloc
  File "cupy/cuda/memory.pyx", line 1395, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 1416, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 1096, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 1117, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc
  File "cupy/cuda/memory.pyx", line 1332, in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc
  File "cupy/cuda/memory.pyx", line 1067, in cupy.cuda.memory.SingleDeviceMemoryPool._alloc
  File "/home/a100/zsozso/deploy/.venv/lib/python3.8/site-packages/thinc/backends/_cupy_allocators.py", line 52, in cupy_pytorch_allocator
    torch_tensor = torch.zeros((size_in_bytes // 4,), requires_grad=False)

How to reproduce the behaviour

I used the following config:

[paths]
# We need to define these variables in order to override them through `spacy train`
init_tok2vec = null
vectors = null
train = null
dev = null

[system]
gpu_allocator = "pytorch"
seed = 0

[nlp]
lang = "hu"
pipeline = ["tok2vec","tagger","morphologizer","senter","experimental_arc_predicter","experimental_arc_labeler"]
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
batch_size = 256

[components]

[components.senter]
factory = "senter"

[components.senter.model]
@architectures = "spacy.Tagger.v1"
nO = null

[components.senter.model.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"
width = ${components.tok2vec.model.encode.width}
upstream = "*"

[components.morphologizer]
factory = "morphologizer"

[components.morphologizer.model]
@architectures = "spacy.Tagger.v1"
nO = null

[components.morphologizer.model.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"
width = ${components.tok2vec.model.encode.width}
upstream = "*"

[components.experimental_arc_labeler]
factory = "experimental_arc_labeler"

[components.experimental_arc_labeler.model]
@architectures = "spacy-experimental.Bilinear.v1"
hidden_width = 128
mixed_precision = true

[components.experimental_arc_labeler.model.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"
width = ${components.tok2vec.model.encode.width}
upstream = "*"

[components.experimental_arc_predicter]
factory = "experimental_arc_predicter"

[components.experimental_arc_predicter.model]
@architectures = "spacy-experimental.PairwiseBilinear.v1"
hidden_width = 256
nO = 1
mixed_precision = true

[components.experimental_arc_predicter.model.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"
width = ${components.tok2vec.model.encode.width}
upstream = "*"

[components.tagger]
factory = "tagger"

[components.tagger.model]
@architectures = "spacy.Tagger.v1"
nO = null

[components.tagger.model.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"
width = ${components.tok2vec.model.encode.width}
upstream = "*"

[components.tok2vec]
factory = "tok2vec"

[components.tok2vec.model]
@architectures = "spacy.Tok2Vec.v2"

[components.tok2vec.model.embed]
@architectures = "spacy.MultiHashEmbed.v2"
width = ${components.tok2vec.model.encode.width}
attrs = ["LOWER","PREFIX","SUFFIX","SHAPE"]
rows = [5000,2500,2500,2500]
include_static_vectors = true

[components.tok2vec.model.encode]
@architectures = "spacy.MaxoutWindowEncoder.v2"
width = 300
depth = 4
window_size = 2
maxout_pieces = 5

[corpora]

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 2000
gold_preproc = false
limit = 0
augmenter = null

[training]
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
accumulate_gradient = 1
patience = 1600
max_epochs = 0
max_steps = 20000
eval_frequency = 200
frozen_components = []
before_to_disk = null
annotating_components = ["senter"]

[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2
get_length = null

[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001
t = 0.0

[training.logger]
@loggers = "spacy.WandbLogger.v2"
project_name = "Exp-parser"

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = true
eps = 0.00000001

[training.optimizer.learn_rate]
@schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 20000
initial_rate = 0.00005

[training.score_weights]
tag_acc = 0.2
pos_acc = 0.2
morph_acc = 0.2
morph_per_feat = null
dep_uas = 0.0
dep_las = 0.2
dep_las_per_type = null
bound_dep_uas = 0.0
bound_dep_las = 0.0
sents_p = null
sents_r = null
sents_f = 0.2

[pretraining]

[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null

[initialize.components]

[initialize.tokenizer]

Your Environment

danieldk commented 2 years ago

Thanks for the report!

The first error occurs because CPUs don't support mixed-precision. You could set both instances of

mixed_precision = true

to

mixed_precision = false

in the configuration. Admittedly, this is not very convenient, I'll look into disabling mixed-precision altogether when running on CPU, that would probably be nicer than the current assertion.

I have to look into the second error in more detail, though it seems that the trace is not completely pasted?

Fair warning ahead: the accuracy of the biaffine parser is not great yet with a convolutional tok2vec layer. I am currently also working on a set of changes that also improve accuracy when training a transformer model quite a bit.

svlandeg commented 2 years ago

Closing as the first issue was addressed with https://github.com/explosion/thinc/pull/624. If you still run into issues with the second error, feel free to open a new issue with the full stack trace!

github-actions[bot] commented 2 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.