OpenNMT / CTranslate2

Fast inference engine for Transformer models
https://opennmt.net/CTranslate2
MIT License
3.39k stars 302 forks source link

Add converter for xformers #1049

Open erip opened 1 year ago

erip commented 1 year ago

xFormers is an optimized toolkit for highly configurable transformer-based encoder-decoder models. Adding support for inference through ctranslate2 would be highly useful to deploy xformers models.

guillaumekln commented 1 year ago

Thanks for the request.

How are you using xFormers currently? Are you building full models using xFormer.from_config(config), as shown in the documentation?

erip commented 1 year ago

Yes, though I think you could (relatively simply) infer it from the serialized checkpoints as well. I think some bits might not be directly convertible (specialized attention schemes, for instance) but otherwise I think it could be a pretty light lift!

guillaumekln commented 1 year ago

Usually the checkpoint is not enough to fully resolve the model architecture. We also need to know activations, the norm style (pre-norm vs. post-norm), etc. These information are usually not saved in the checkpoint.

A more general issue is that xFormers does not implement the input and output layers of the model. It means we can't provide a ready-to-use converter since it also depends on unknown user code. However, we can still provide a template or helper functions to process the xFormers model itself and let the user register the remaining modules.

erip commented 1 year ago

I think the only layer xFormers doesn't implement is the output layer. I'd need to double check whether activations, etc. are in the checkpoint (I think they might be, but fused via triton), but a template would be super useful (where a user can give keys in the checkpoint to output layer weights, etc.)!

guillaumekln commented 1 year ago

I think the only layer xFormers doesn't implement is the output layer.

Right, the word embedding layer is implemented in the "vocab" position embedding.

I also found they don't implement the final layer norm in the encoder/decoder when using the pre-norm residual style. We actually support that, but it's a difference with all other pre-norm implementations.


Anyway, here's a possible implementation of a xFormer converter: https://gist.github.com/guillaumekln/4761f65df1ce3e80f5969fd0f0a2c7f5

It gives an idea on how the conversion works and what are the supported encoder-decoder configurations. Please have a look at the TODOs and see if you can make it work for your model.

guillaumekln commented 1 year ago

Hi @erip, did you have a chance to try a conversion?