clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
https://arxiv.org/abs/2111.15664
MIT License
5.75k stars 466 forks source link

Where the output of swin diffused with the text->1.At the starting of Bart encoder,2. cross attention(K,V from swin,Q from attention) of second attention of Bart encoder,3.directly the decoder part of BART #232

Open shubham953 opened 1 year ago

shubham953 commented 1 year ago

I have found an auto generative architecture .does donut follows the same? WhatsApp Image 2023
![Screenshot 2023-08-02 003843](https://github.com/clovaai/donut/assets/68180710/e0482adb-547f-498a-8c79-47bbcd10cf71)
-08-02 at 00 00 57

https://github.com/clovaai/donut/assets/68180710/86a0fd05-abe7-4ad0-b6de-f4aa1ef2d3e7

shubham953 commented 1 year ago

Screenshot 2023-08-02 003843 Is same process applied for trainig and testing as the image?

shubham953 commented 1 year ago

77 #207 #106 #90

179