Intel Advanced Matrix Extensions (AMX) support

Hello CTranslate2 developers,

I am a user of your library and I appreciate your work on providing fast and accurate inference engine. I am wondering if you have any plans to support Intel Advanced Matrix Extensions (AMX) for CPU inference. According to Intel, AMX can speed up inference by several factors for certain models and data types.

I have tried to compile CTranslate2 from the source code with the -mamx-tile -mamx-int8 -mamx-bf16 flags, but it seems that there are some additional steps required to enable AMX (maybe adding a new kernel as vec_amx.h with modifiying vec_avx512.h to enable AMX tile operations).

I would appreciate it if you could share your thoughts on this topic and let me know if AMX support is feasible and desirable for CTranslate2.

Thank you for your time and attention.

OpenNMT / CTranslate2

Intel Advanced Matrix Extensions (AMX) support #1632