Using en_pytt_bertbaseuncased_lg with spacy pretrain

haroonhassan commented 5 years ago

How to reproduce the problem

# copy-paste the error message here

Your Environment

Operating System:
Python Version Used:
spaCy Version Used:
Environment Information:

haroonhassan commented 5 years ago

I am trying to use the spacy bert model to pretrain. I downloaded the model with:

pip install spacy-pytorch-transformers
spacy download en_pytt_bertbaseuncased_lg

I then used the following command:

spacy pretrain data.jsonl en_pytt_bertbaseuncased_lg -o temp

I got the following trace:

✔ Saved settings to config.json
✔ Loaded input texts
✔ Loaded model 'en_pytt_bertbaseuncased_lg'

============== Pre-training tok2vec layer - starting at epoch 0 ==============
  #      # Words   Total Loss     Loss    w/s
/home/haroon/miniconda3/envs/ssl/lib/python3.6/site-packages/spacy/cli/pretrain.py:317: RuntimeWarning: invalid value encountered in true_divide
  cosine = (yh * y).sum(axis=1, keepdims=True) / mul_norms
Traceback (most recent call last):
  File "/home/haroon/miniconda3/envs/ssl/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/haroon/miniconda3/envs/ssl/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/haroon/miniconda3/envs/ssl/lib/python3.6/site-packages/spacy/__main__.py", line 35, in <module>
    plac.call(commands[command], sys.argv[1:])
  File "/home/haroon/miniconda3/envs/ssl/lib/python3.6/site-packages/plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "/home/haroon/miniconda3/envs/ssl/lib/python3.6/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/home/haroon/miniconda3/envs/ssl/lib/python3.6/site-packages/spacy/cli/pretrain.py", line 218, in pretrain
    model, docs, optimizer, objective=loss_func, drop=dropout
  File "/home/haroon/miniconda3/envs/ssl/lib/python3.6/site-packages/spacy/cli/pretrain.py", line 247, in make_update
    backprop(gradients, sgd=optimizer)
  File "/home/haroon/miniconda3/envs/ssl/lib/python3.6/site-packages/spacy/_ml.py", line 759, in mlm_backward
    return backprop(d_output, sgd=sgd)
  File "/home/haroon/miniconda3/envs/ssl/lib/python3.6/site-packages/thinc/neural/_classes/feed_forward.py", line 53, in continue_update
    gradient = callback(gradient, sgd)
  File "/home/haroon/miniconda3/envs/ssl/lib/python3.6/site-packages/thinc/neural/_classes/affine.py", line 67, in finish_update
    self.ops.gemm(grad__BO, input__BI, trans1=True, out=self.d_W)
  File "ops.pyx", line 422, in thinc.neural.ops.NumpyOps.gemm
ValueError: Buffer and memoryview are not contiguous in the same dimension.

Help would be much appreciated.

Operating System: Ubuntu Python Version Used: 3.6.7 spaCy Version Used: 2.1.8 Environment Information: conda

ines commented 5 years ago

I'm not sure what exactly you're trying to do or achieve, but the problem here is that spacy-pytorch-transformers and spacy pretrain are two very different things.

spacy-pytroch-transformers lets you use pre-trained embeddings like the various BERT models in spaCy to train downstream models (we currently have a custom implementation for text classification) or for similarity comparisons.

spacy pretrain lets you create similar pre-trained embeddings using word vectors and raw text. The mechanism is similar to the BERT/ELMo/ULMFiT approach, but instead of predicting the next word etc., it's predicting the vector. At the end of it, you get pretrained embeddings that you can use to train your model.

So it currently doesn't make sense to use the BERT embeddings for pretraining. In theory, it could be possible to use those embeddings instead of the word vectors, but that's an open research question.

lock[bot] commented 5 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

explosion / spaCy

Using en_pytt_bertbaseuncased_lg with spacy pretrain #4310

How to reproduce the problem

Your Environment