The README mentions that the base model uses a Swin Transformer encoder and an mBART decoder. The mBART decoder has 10 layers and the whole architecture has 350M parameters.
The smaller model has a slightly smaller sequence length and only 4 decoder layers, with a total of 250M parameters.
Based on the README, Nougat has two main models:
The README mentions that the base model uses a Swin Transformer encoder and an mBART decoder. The mBART decoder has 10 layers and the whole architecture has 350M parameters.
The smaller model has a slightly smaller sequence length and only 4 decoder layers, with a total of 250M parameters.
So in summary:
Am I right?