Then I used generator = ctranslate2.Generator("gptj_ct2")
But it returns the following:
[ctranslate2] [thread 144] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.
Then inference runs on CPU only.
I did not have this problem with NLLB or Whisper.
The GPU is properly detected by PyTorch (torch.cuda.is_available() = True).
Thank you in advance, and congrats again for the great work! The fact that text generation now supports end sequences and other new parameters is awesome!
I followed the GPT-J tuto: https://opennmt.net/CTranslate2/guides/transformers.html#gpt-j
First I converted the model with this command:
ct2-transformers-converter --model EleutherAI/gpt-j-6B --revision float16 --quantization float16 --output_dir gptj_ct2
Then I used
generator = ctranslate2.Generator("gptj_ct2")
But it returns the following:
[ctranslate2] [thread 144] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.
Then inference runs on CPU only.
I did not have this problem with NLLB or Whisper.
The GPU is properly detected by PyTorch (
torch.cuda.is_available() = True
).Thank you in advance, and congrats again for the great work! The fact that text generation now supports end sequences and other new parameters is awesome!