OOM Possible Leak - Githubissues

jacquesg commented 2 years ago

How to reproduce the behaviour

I have inexplicable OOM when running spacy project run evaluate.

RuntimeError: CUDA out of memory. Tried to allocate 92.00 MiB (GPU 0; 23.70 GiB total capacity; 9.63 GiB already allocated; 128.88 MiB free; 10.85 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The strange thing for me here is that 23.7GB of GPU RAM is available, but it seems like when it runs out of memory, it only knows about 10.85GB. I've tried to tweak a number of parameters to see if it makes a difference, but as far as I can tell none of them make any significant difference.

The dev dataset is tiny, it only has 1000 samples, on average I would guess each being 500 or so words.

Your Environment

Operating System: Linux (Ubuntu 20.04)
Python Version Used: 3.8.12
spaCy Version Used: 3.2.2
Environment Information: NVIDIA Driver 510.47.03

jacquesg commented 2 years ago

Additional information: I'm using torch 1.10.2, but the same thing happens with torch 1.10.1. I installed these by running:

pip install torch==1.10.2+cu113 torchvision==0.11.3+cu113 torchaudio==0.10.2+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

Configuration:

[paths]
train = null
dev = null
vectors = null
init_tok2vec = null

[system]
gpu_allocator = "pytorch"
seed = 0

[nlp]
lang = "en"
pipeline = ["transformer", "ner"]
batch_size = 32
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}

[components]

[components.transformer]
source = "en_core_web_trf"
component = "transformer"

[components.transformer.model]
mixed_precision = true

[components.ner]
factory = "ner"
incorrect_spans_key = null
moves = null
scorer = {"@scorers":"spacy.ner_scorer.v1"}
update_with_oracle_cut_size = 100

[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = false
nO = null

[components.ner.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
pooling = {"@layers":"reduce_mean.v1"}
upstream = "*"

[corpora]

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0
gold_preproc = false
limit = 0

[corpora.train.augmenter]
@augmenters = "spacy.lower_case.v1"
level = 0.15

[training]
accumulate_gradient = 3
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.20
patience = 4000
max_epochs = 0
max_steps = 4000
eval_frequency = 100
frozen_components = []
annotating_components = []
before_to_disk = null

[training.batcher]
@batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
size = 1000
buffer = 1

[training.logger]
@loggers = "spacy.ConsoleLogger.v1"
progress_bar = false

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001

[training.optimizer.learn_rate]
@schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 20000
initial_rate = 0.00005

[training.score_weights]
ents_f = 1
ents_p = 0
ents_r = 0

[pretraining]

[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null

[initialize.components]

[initialize.tokenizer]

jacquesg commented 2 years ago

Here is what I see if I train the model with the same config:

adrianeboyd commented 2 years ago

My first guess is that it's related to the gpu_allocator setting, which is set by spacy train but isn't currently configurable through spacy evaluate. Here's a related thread with some background: #8984

Can you see if it makes a difference if you add set_gpu_allocator("pytorch") right after setup_gpu in spacy/cli/evaluate.py?

jacquesg commented 2 years ago

That does indeed fix it. Thank you!

No idea around the scope of the changes, and actually haven't read the whole of #8984, but is this trivially fixable? Put differently, a work-around would be useful, patching spacy isn't something I'd like to do or maintain myself :smile:

adrianeboyd commented 2 years ago

It's on our medium-priority to-do list, but it's a bit more involved to only set it when a model that uses torch is loaded vs. setting it when the user has torch installed in their venv. (I implemented the rejected second option in #9539.)

I think the plan was only to warn rather than to automatically set it for users, but you can do this for spacy evaluate by putting it in custom code that's loaded with --code, no hacking required, which is what I should have suggested in the first place.

jacquesg commented 2 years ago

Ah, I'm already using the --code option, is there a specific hook point, or is it fine as long as it runs as part of loading the file?

adrianeboyd commented 2 years ago

It should be fine as long as it runs when the code is imported.

jacquesg commented 2 years ago

Confirmed, it also works. Much better solution, thank you.

Could this idiosyncrasy be documented? Lost a few hours trying to infer where the problem was, and its quite subtle.

adrianeboyd commented 2 years ago

It's documented here, although I can see that the details might not jump out at you when you're searching the docs: https://spacy.io/usage/embeddings-transformers#transformers-runtime

We should add a pointer to our FAQ discussion post: #8226

github-actions[bot] commented 2 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

explosion / spaCy

OOM Possible Leak #10341

How to reproduce the behaviour

Your Environment