OpenNMT / CTranslate2

Fast inference engine for Transformer models
https://opennmt.net/CTranslate2
MIT License
3.28k stars 287 forks source link

Exception when exporting bloomz model #1324

Open jordimas opened 1 year ago

jordimas commented 1 year ago

Operaring system: Ubuntu 22.04.2, Python 3.10.6, CTranslate2 3.16.

When exporting the bigscience/bloomz model using:

_ct2-transformers-converter --force --model bigscience/bloomz --outputdir bloomz --quantization float16

The conversion process works well for other bloom models like bloomz-7b1.

Error:

Traceback (most recent call last): File "/home/ubuntu/bloom-ctranslate2/generate/python3-env/bin/ct2-transformers-converter", line 8, in sys.exit(main()) File "/home/ubuntu/bloom-ctranslate2/generate/python3-env/lib/python3.10/site-packages/ctranslate2/converters/transformers.py", line 1577, in main converter.convert_from_args(args) File "/home/ubuntu/bloom-ctranslate2/generate/python3-env/lib/python3.10/site-packages/ctranslate2/converters/converter.py", line 50, in convert_from_args return self.convert( File "/home/ubuntu/bloom-ctranslate2/generate/python3-env/lib/python3.10/site-packages/ctranslate2/converters/converter.py", line 104, in convert model_spec.save(output_dir) File "/home/ubuntu/bloom-ctranslate2/generate/python3-env/lib/python3.10/site-packages/ctranslate2/specs/model_spec.py", line 571, in save super().save(output_dir) File "/home/ubuntu/bloom-ctranslate2/generate/python3-env/lib/python3.10/site-packages/ctranslate2/specs/model_spec.py", line 324, in save self._serialize(os.path.join(output_dir, "model.bin")) File "/home/ubuntu/bloom-ctranslate2/generate/python3-env/lib/python3.10/site-packages/ctranslate2/specs/model_spec.py", line 363, in _serialize model.write(struct.pack("I", value.nbytes)) struct.error: 'I' format requires 0 <= number <= 4294967295

guillaumekln commented 1 year ago

Bloomz has 176B parameters and CTranslate2 does not really support these large models at this time.

The error means the number of bytes used by a weight cannot be represented by a 32-bit integer.

This is not the only issue with these models. Some functional changes are also required, for example splitting the models on multiple GPUs (see #1052).

aflah02 commented 8 months ago

@guillaumekln Is there any update on this? Does CTranslate2 now support such larger models as well without any quantization?