Support for Encoder-Decoder-style architectures

bilelomrani1 commented 11 months ago

I regularly follow the developments on this project, and I must say that I am very interested and pleased with the direction curated-transformers is taking. The code is very understandable and high-quality, it's a pleasure to work with, congratulations!

This is perhaps already in your plans, but just to mention it here, I think a very nice addition to the project would be to have at least one reference implementation of an encoder-decoder style Transformers, such as the T5 architecture. T5 models are very popular for some tasks, especially in the < 1B parameters range which is still very relevant nowadays. Currently we have reference implementations for decoder-style and encoder-style models, but we're missing at least one reference implementation of an encoder-decoder-style architecture, perhaps with a reusable cross-attention block.

danieldk commented 11 months ago

Good question. Support for encoder-decoder architectures is definitely planned. The reason that we don't have them yet is that we first focused on encoder-only to cover the standard spaCy pipelines and then decoder-only for common LLMs, but encoder-decoder is something that we want.

bilelomrani1 commented 11 months ago

That's understandable, thank you for the clarification.

explosion / curated-transformers

Support for Encoder-Decoder-style architectures #340