Open jakwisn opened 3 years ago
Thanks for the report, sorry you're having trouble with this. I don't think we've seen this particular error before.
You already mention lowering the batchsize, which seems to be the general PyTorch advice for dealing with this error. Based on this issue it looks like running out of CPU RAM can also be an issue, could that potentially be the cause in your case?
Separately this jumped out at me:
My trainset has 15k sentences but if I lower this to 12 it works properly
Do you maybe have an unusually long sentence somewhere in the 3k you omitted?
I think the issue is:
gpu_allocator = "tensorflow"
We only support transformers on PyTorch currently, so you'll need to change this to pytorch
.
Hi, thanks for the pieces of advice!
gpu_allocator
made the difference, but it did notMeanwhile I updated pytorch to 1.10.0
and now my error looks like that:
RuntimeError: CUDA out of memory. Tried to allocate 112.00 MiB (GPU 0; 6.00 GiB total capacity; 3.95 GiB already allocated; 0 bytes free; 4.08 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting m
ax_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
(Different amounts of memory here because I did a lot of experiments and it is one of them)
Mabye fragmentation could be an issue (source - section "Memory Management")? Can I somehow fix it in spacy?
Just a note - based on discussion at the linked PyTorch issue, it looks like it's a problem with PyTorch rather than something in spaCy directly. We'll leave this issue open for now, feel free to comment here if you have trouble with this specifically in spaCy, though do check the linked issue first.
for my errors, it is the problem of CPU RAM
Any news on this? I experience the same issue on different machines. Now with 24GB VRAM, cuda 11.6 on pytorch 1.12.1
I have the same issue.
The problem
I am training a sentence classification model using a transformer and a pipeline that is based on the default config. I am doing it on the custom dataset. When I start training I get:
The weird things are:
Can I specifically ask spacy/torch to reserve more memory? There must be something wrong with memory allocation or something draining the memory out.
How to reproduce the behavior
I am running with deft definition dataset and this is in my base config:
Your Environment