CDE 2.1.2: CUDA error: out of memory

Spadet commented 2 years ago

Hi,

I tested the 2.1.2 version and I encountered an issue while analyzing some articles. I activated CUDA and pytorch detects my GPU (Tesla V100 16GB). As the migration guide stated, it does make my computation faster, but I encountered a "RuntimeError: CUDA out of memory" error message. This scenario is described in the migration guide and the following solution is provided:

from chemdataextractor.nlp.new_cem import BertFinetunedCRFCemTagger, CemTagger

ner_tagger = BertFinetunedCRFCemTagger(max_batch_size=100) CemTagger.taggers[2] = ner_tagger

I tried it with no success, and changing the values of max_batch_size did not help either. Is the solution to use a GPU with more memory?

Thank you in advance!

ti250 commented 2 years ago

Interesting; I would have thought that with 16GB VRAM it would be fine... Could you try this with max_batch_size of 1?

Spadet commented 2 years ago

Thank you for your answer, I did try with a max_batch_size as low as 1 and the result was identical.

ti250 commented 2 years ago

Okay I think it may be that you also need to set min_batch_size (I'll make a patch at some point soon so that min_batch_size is always less than max_batch_size), but can you set min_batch_size = 1 as well? (It defaults to 100)

Spadet commented 2 years ago

I am pretty sure this could work because the error message didn't change through testings and the required memory was always identical. Unfortunately I don't have access to the machine anymore so I can not test this solution... Thank you very much for the answer. If I am able to use it in the future I will keep you posted about it!

ti250 commented 2 years ago

Ah sorry I was so late with the reply, I'll keep this open until I make this change for the min_batch_size!

Spadet commented 2 years ago

So I could try the solution you provided today and it seems it does not work either. I applied a min_batch_size = 1, along with a max_batch_size = 1 as in the documentation (https://cambridgemolecularengineering-chemdataextractor-development.readthedocs-hosted.com/en/latest/migration_guides/migration_guide2.1.html). It did not change anything and I had the same results with a size of 1, 100 or even 1000.

I also tried to set a max limit just above the min size (2 and 1 respectively). Any idea why this could not work ?

Thank you in advance :)

ti250 commented 2 years ago

Interesting, I have no idea why this wouldn't work. Let me investigate further...

ghost commented 1 year ago

Hi!

The problem still exists. Based on my experiences, a BERT-base model with NER can run about 64-128 batch size on 16GB VRAM. Thanks!

ghost commented 1 year ago

Hi again!

So in my case I was using a Tesla V100 16GB with a 19k line long raw text file.

The working batch size was 32.

As a non-permanent solution, you can modify the hardcoded value:

here

and

here

in your local package.

Thanks!

ghost commented 1 year ago

Also, 100 batch size is working with rather small (1k lines) text files, however it does not with longer files

ti250 commented 1 year ago

It's interesting to me that reducing the batch size worked for you but didn't work for @Spadet earlier... Wrt the batch size issue, I wonder if the easiest way to make this work would be to have ChemDataExtractor read environment variables (e.g. CDE_NER_MAX_BATCH_SIZE, CDE_NER_MIN_BATCH_SIZE). Does this sound like an okay solution to you?

ghost commented 1 year ago

Hi!

Spadet most likely wanted to use the CEMS API or maybe the Document API without injecting back the CEMS.

I think an industrial like solution would be to dispose the batch size to the Document API. So this mandatory parameter would be accessible and modifiable easily according to VRAM.

Thanks!

ghost commented 1 year ago

I guess also you might try a large raw text file like 20k lines, and I suppose it wouldn’t work with this high batch size on 16GB VRAM. Oftentimes in my experiences higher batch size works on smaller number of batches, but it might be too high on way larger inputs (like in this case 20k)

ghost commented 1 year ago

Or it might be also a solution to inject back the CEMS API to the Document! :)

CambridgeMolecularEngineering / chemdataextractor2

CDE 2.1.2: CUDA error: out of memory #14