Open ep0p opened 5 days ago
If the model predictions are a little off, maybe because your provided PDFs (format, content and so on..) deviate to some degree from the training material, this is nothing to worry about and not uncommon. Could be the fonts or spacing are different and therefore harder to parse correctly for the model. I'd suggest to post-process the predictions yourself in this case, using an NLP package to detect word boundaries (an idea from here) and remove the faulty spacing within those boundaries. Or you could fine-tune the model on your data, which I guess if it's just a spacing issue should be resolved quickly. I am also noticing the text is in French and a legislative text or conference protocol, which might also contribute to the problem...
Hi @paulgekeler,
Indeed when using the fined tuned version this issue no longer exists. It is however replaced with pages being ignored, not a single word on them recognised
Do you have any idea if GOT can handle images that might be skewed? Another guess, it might be the noise and i need to fine tune with noisy images, i'll see about that.
PS: all my documents are french legal documents with, sometimes, complicated layouts.
@ep0p yes, I've experienced the same thing. When I try to run multi page inference, I barely get any output. Maybe the first couple of lines of text. My suspicion is that the compression of the visual information is too much for dense text over multiple pages. I think their multi page training consisted of multiple pages of sparse text.
Hi, it would help if you use a for-loop for multi-page inference. The multi-page is only for training, more details can be found in the paper.
@Ucas-HaoranWei thanks I read the paper. I will try to fine-tune some more on multi-page data.
@paulgekeler and @Ucas-HaoranWei In my case, I split the PDF into images and performed inference in a loop, page by page. Some pages were ignored, even though they had the same format as the others. However, it seemed to me that they were slightly tilted. I deskewed them, and this apparently helped because they were properly recognized afterward.
Would fine-tuning with skewed images help in this case?
@ep0p pretty sure it would. For example in Nougat and Donut as well, they distort some image pages before training to increase robustness
@paulgekeler thanks a lot. i will add a skewed subset in my dataset as well and attempt a fine tuning
@ep0p Did you manage to finetune your dataset? If you did sucessfully, would you mind sharing the format of your data and training settings?
I'm encountering an issue when using GOT for inferencing plain text. The output is not consistent: sometimes it detects the text correctly, but other times, it introduces spaces between letters, creating nonsense words:
For example:
This inconsistency becomes particularly problematic when processing PDFs with multiple pages. Even if most pages are inferenced correctly, a couple of pages might have this spacing issue, which disrupts the results.
I can't figure out why this happens or how to enforce a consistent format, ensuring only the "good" text format is used