[question] ViTSTR experiment

baudm / parseq

Scene Text Recognition with Permuted Autoregressive Sequence Models (ECCV 2022)

https://huggingface.co/spaces/baudm/PARSeq-OCR

Apache License 2.0

562 stars 125 forks source link

[question] ViTSTR experiment #37

Closed felixdittrich92 closed 2 years ago

felixdittrich92 commented 2 years ago

Hi @baudm 👋 ,

thanks a lot for this great repository 👍 I saw you have run some experiments with ViTSTR and an input of 32x128.

Could you maybe share the results ? :) And do you run the experiment from scratch ?

I have planned to integrate both ViTSTR and ParSeq in https://github.com/mindee/doctr in the next time 👍

Best regards

baudm commented 2 years ago

Hello @felixdittrich92. Thanks for your interest and initiative for integration.

The results for ViTSTR for 128x32 images are in the paper. arXiv link in the README. :)

Yes, all models were trained from scratch using the same exact datasets and training pipeline and strategy (in contrast to the original ViTSTR which used DeiT weights for initialization).

felixdittrich92 commented 2 years ago

@baudm Thanks 👍