gaopengcuhk / Stable-Pix2Seq

A full-fledged version of Pix2Seq
Apache License 2.0
235 stars 20 forks source link

Extract token embedding #16

Open elituan opened 1 year ago

elituan commented 1 year ago

I want to extract the token embedding as shown in figure 11 of the paper. image

However, when looking at the code, I see that the tokens are predicted by feeding the output feature map to a mlp whose last layer's dimension is 2003 (maybe number of tokens). Hence, the model do not learn the token embedding actually and we can't get the learned token embedding.

Am I missing something ?