awslabs / mlm-scoring

Python library & examples for Masked Language Model Scoring (ACL 2020)
https://www.aclweb.org/anthology/2020.acl-main.240/
Apache License 2.0
333 stars 59 forks source link

Hardcoded GPU 0? #9

Open mfelice opened 3 years ago

mfelice commented 3 years ago

Hi there,

I'm facing an issue with your PyTorch implementation and some input sentences. E.g.

s = 'RT @HISPANlCPROBS : When u walk straight into the kitchen to eat & ur mom hits u with the " ya saludaste " #ThanksgivingWithHispanics https://…'
print(scorer.score_sentences([s]))

gives the following error:

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 11.91 GiB total capacity; 451.65 MiB already allocated; 12.12 MiB free; 40.35 MiB cached)

I'm working on a server with three GPUs and tried setting ctxs = [mx.gpu(0)],ctxs = [mx.gpu(1)], ctxs = [mx.gpu(2)] and ctxs = [mx.cpu()] but I always get the same error about GPU 0. I'm wondering if this is hardcoded somewhere in your code? Changing the ctxs variable seems to have no effect.

Thanks.

DarrenAbramson commented 3 years ago

From the fourth line of the readme:

ctxs = [mx.cpu()] # or, e.g., [mx.gpu(0), mx.gpu(1)]

Did you happen to try ctxs = [mx.gpu(0), mx.gpu(1), mx.gpu(2)]?

As for finding things that are hard-coded, are you aware that you can search the repository?

mfelice commented 3 years ago

Thanks! The values in ctxs seem to be ignored. However, I've been able to circumvent the issue by setting CUDA_VISIBLE_DEVICES. I believe the culprit is cuda:0 and/or device_ids=[0] in the following block:

https://github.com/awslabs/mlm-scoring/blob/672729747432810f9bcb37149104124dd3cc4165/src/mlm/scorers.py#L561-L568

Maybe that should be set to whatever is specified by ctxs?