NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.
MIT License
9.17k stars 1.42k forks source link

Donut model size beyond 768*2 max_length #283

Open nik13 opened 1 year ago

nik13 commented 1 year ago

Is there any way to go beyond the max_length of 768x2? I tried training the model using 768x4 as the max_length with sufficient gpu power, but its giving internal cuda error (not related to memory usage).

Is there any way to achieve greater max_length? or its just model limitation?

lusid commented 1 year ago

I am also looking for the answer to this.

sjtu-cz commented 5 months ago

Any conclusions?

NielsRogge commented 5 months ago

I think you would need to interpolate the position embeddings of the pre-trained text decoder for the model to go beyond 768 tokens.

As seen here: https://github.com/clovaai/donut/blob/4cfcf972560e1a0f26eb3e294c8fc88a0d336626/donut/model.py#L188-L195