GPT-J on Tesla T4: the target device or backend do not support efficient float16 computation

juliensalinas commented 1 year ago

I followed the GPT-J tuto: https://opennmt.net/CTranslate2/guides/transformers.html#gpt-j

First I converted the model with this command:

ct2-transformers-converter --model EleutherAI/gpt-j-6B --revision float16 --quantization float16 --output_dir gptj_ct2

Then I used generator = ctranslate2.Generator("gptj_ct2")

But it returns the following:

[ctranslate2] [thread 144] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.

Then inference runs on CPU only.

I did not have this problem with NLLB or Whisper.

The GPU is properly detected by PyTorch (torch.cuda.is_available() = True).

Thank you in advance, and congrats again for the great work! The fact that text generation now supports end sequences and other new parameters is awesome!

guillaumekln commented 1 year ago

Models are loaded on the CPU by default. You should configure the device:

generator = ctranslate2.Generator("gptj_ct2", device="cuda")

juliensalinas commented 1 year ago

Thank you @guillaumekln it worked 👍🏻 Have a nice weekend.

OpenNMT / CTranslate2

GPT-J on Tesla T4: the target device or backend do not support efficient float16 computation #1200