OpenNMT / CTranslate2

Fast inference engine for Transformer models
https://opennmt.net/CTranslate2
MIT License
3.41k stars 303 forks source link

Intel Advanced Matrix Extensions (AMX) support #1632

Open ahmetcanik opened 8 months ago

ahmetcanik commented 8 months ago

Hello CTranslate2 developers,

I am a user of your library and I appreciate your work on providing fast and accurate inference engine. I am wondering if you have any plans to support Intel Advanced Matrix Extensions (AMX) for CPU inference. According to Intel, AMX can speed up inference by several factors for certain models and data types.

I have tried to compile CTranslate2 from the source code with the -mamx-tile -mamx-int8 -mamx-bf16 flags, but it seems that there are some additional steps required to enable AMX (maybe adding a new kernel as vec_amx.h with modifiying vec_avx512.h to enable AMX tile operations).

I would appreciate it if you could share your thoughts on this topic and let me know if AMX support is feasible and desirable for CTranslate2.

Thank you for your time and attention.

minhthuc2502 commented 8 months ago

Hello, Thank you for your suggestion. We have no plan to do this now. However, some works are needed to implement AMX for some operations. It would be nice to have this, I will look at it more in detail.