Open bilelomrani1 opened 11 months ago
Good question. Support for encoder-decoder architectures is definitely planned. The reason that we don't have them yet is that we first focused on encoder-only to cover the standard spaCy pipelines and then decoder-only for common LLMs, but encoder-decoder is something that we want.
That's understandable, thank you for the clarification.
I regularly follow the developments on this project, and I must say that I am very interested and pleased with the direction
curated-transformers
is taking. The code is very understandable and high-quality, it's a pleasure to work with, congratulations!This is perhaps already in your plans, but just to mention it here, I think a very nice addition to the project would be to have at least one reference implementation of an encoder-decoder style Transformers, such as the T5 architecture. T5 models are very popular for some tasks, especially in the < 1B parameters range which is still very relevant nowadays. Currently we have reference implementations for decoder-style and encoder-style models, but we're missing at least one reference implementation of an encoder-decoder-style architecture, perhaps with a reusable cross-attention block.