VikParuchuri / marker

Convert PDF to markdown quickly with high accuracy
https://www.datalab.to
GNU General Public License v3.0
14.65k stars 763 forks source link

Token indices sequence length is longer than the specified maximum sequence length for this model (395 > 384) #78

Closed mrticker closed 2 months ago

mrticker commented 5 months ago

Token indices sequence length is longer than the specified maximum sequence length for this model (395 > 384). Running this sequence through the model will result in indexing errors

I get this warning relatively often when converting arxiv papers. Examples: https://arxiv.org/abs/2001.04451 https://arxiv.org/abs/2301.10226 https://arxiv.org/abs/2002.05770

yasyf commented 4 months ago

+1! Same on various biorxiv papers

moxi000 commented 4 months ago

+1, same problem

VikParuchuri commented 2 months ago

This should be fine - texify can work above this length. (it's a warning, not an error)