Closed jacquesg closed 2 years ago
Additional information: I'm using torch 1.10.2
, but the same thing happens with torch 1.10.1
. I installed these by running:
pip install torch==1.10.2+cu113 torchvision==0.11.3+cu113 torchaudio==0.10.2+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
Configuration:
[paths]
train = null
dev = null
vectors = null
init_tok2vec = null
[system]
gpu_allocator = "pytorch"
seed = 0
[nlp]
lang = "en"
pipeline = ["transformer", "ner"]
batch_size = 32
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
[components]
[components.transformer]
source = "en_core_web_trf"
component = "transformer"
[components.transformer.model]
mixed_precision = true
[components.ner]
factory = "ner"
incorrect_spans_key = null
moves = null
scorer = {"@scorers":"spacy.ner_scorer.v1"}
update_with_oracle_cut_size = 100
[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = false
nO = null
[components.ner.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
pooling = {"@layers":"reduce_mean.v1"}
upstream = "*"
[corpora]
[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null
[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0
gold_preproc = false
limit = 0
[corpora.train.augmenter]
@augmenters = "spacy.lower_case.v1"
level = 0.15
[training]
accumulate_gradient = 3
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.20
patience = 4000
max_epochs = 0
max_steps = 4000
eval_frequency = 100
frozen_components = []
annotating_components = []
before_to_disk = null
[training.batcher]
@batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
size = 1000
buffer = 1
[training.logger]
@loggers = "spacy.ConsoleLogger.v1"
progress_bar = false
[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001
[training.optimizer.learn_rate]
@schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 20000
initial_rate = 0.00005
[training.score_weights]
ents_f = 1
ents_p = 0
ents_r = 0
[pretraining]
[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null
[initialize.components]
[initialize.tokenizer]
Here is what I see if I train the model with the same config:
My first guess is that it's related to the gpu_allocator
setting, which is set by spacy train
but isn't currently configurable through spacy evaluate
. Here's a related thread with some background: #8984
Can you see if it makes a difference if you add set_gpu_allocator("pytorch")
right after setup_gpu
in spacy/cli/evaluate.py
?
That does indeed fix it. Thank you!
No idea around the scope of the changes, and actually haven't read the whole of #8984, but is this trivially fixable? Put differently, a work-around would be useful, patching spacy
isn't something I'd like to do or maintain myself :smile:
It's on our medium-priority to-do list, but it's a bit more involved to only set it when a model that uses torch is loaded vs. setting it when the user has torch
installed in their venv. (I implemented the rejected second option in #9539.)
I think the plan was only to warn rather than to automatically set it for users, but you can do this for spacy evaluate
by putting it in custom code that's loaded with --code
, no hacking required, which is what I should have suggested in the first place.
Ah, I'm already using the --code
option, is there a specific hook point, or is it fine as long as it runs as part of loading the file?
It should be fine as long as it runs when the code is imported.
Confirmed, it also works. Much better solution, thank you.
Could this idiosyncrasy be documented? Lost a few hours trying to infer where the problem was, and its quite subtle.
It's documented here, although I can see that the details might not jump out at you when you're searching the docs: https://spacy.io/usage/embeddings-transformers#transformers-runtime
We should add a pointer to our FAQ discussion post: #8226
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
How to reproduce the behaviour
I have inexplicable OOM when running
spacy project run evaluate
.The strange thing for me here is that 23.7GB of GPU RAM is available, but it seems like when it runs out of memory, it only knows about 10.85GB. I've tried to tweak a number of parameters to see if it makes a difference, but as far as I can tell none of them make any significant difference.
The
dev
dataset is tiny, it only has 1000 samples, on average I would guess each being 500 or so words.Your Environment