HomebrewNLP / Olmax

HomebrewNLP in JAX flavour for maintable TPU-Training
BSD 2-Clause "Simplified" License
45 stars 6 forks source link

Encoder-Decoder Architecture #44

Open ClashLuke opened 2 years ago

ClashLuke commented 2 years ago

Currently, our model can be either an encoder or a decoder. Combining these two, as in T5, is not possible. The best approximation we could get at the moment would be to expand the context of our decoder, but the performance of a decoder-only model isn't as good. Ideally, we could run full "attention" for one part and sample autoregressive for the other.\ This issue discusses ideas for implementing such a scheme and benchmarking it against the baseline fully-autoregressive model.