OpenNMT / CTranslate2

Fast inference engine for Transformer models
https://opennmt.net/CTranslate2
MIT License
3.42k stars 304 forks source link

GPT-J on Tesla T4: the target device or backend do not support efficient float16 computation #1200

Closed juliensalinas closed 1 year ago

juliensalinas commented 1 year ago

I followed the GPT-J tuto: https://opennmt.net/CTranslate2/guides/transformers.html#gpt-j

First I converted the model with this command:

ct2-transformers-converter --model EleutherAI/gpt-j-6B --revision float16 --quantization float16 --output_dir gptj_ct2

Then I used generator = ctranslate2.Generator("gptj_ct2")

But it returns the following:

[ctranslate2] [thread 144] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.

Then inference runs on CPU only.

I did not have this problem with NLLB or Whisper.

The GPU is properly detected by PyTorch (torch.cuda.is_available() = True).

Thank you in advance, and congrats again for the great work! The fact that text generation now supports end sequences and other new parameters is awesome!

guillaumekln commented 1 year ago

Models are loaded on the CPU by default. You should configure the device:

generator = ctranslate2.Generator("gptj_ct2", device="cuda")
juliensalinas commented 1 year ago

Thank you @guillaumekln it worked 👍🏻 Have a nice weekend.