csguoh / LEMMA

[IJCAI2023] An official implement of the paper "Towards Robust Scene Text Image Super-resolution via Explicit Location Enhancement"
Apache License 2.0
50 stars 5 forks source link

Checkpoints details for testing the model. #20

Closed Bhavik-Ardeshna closed 2 months ago

Bhavik-Ardeshna commented 3 months ago

Hi @csguoh,

Thank you for open-sourcing such great work!

I want to test and train the model on a custom dataset. To start the testing phase, I set up your code on a GPU. In the config file, I added the path of LEMMA-release.pth in TEST and specified the correct path for MORAN.

Now, I am encountering an issue with the PositionAware section when testing the model using TextZoom test data. Which checkpoint files for vision and language should I use?

Also, do I need to convert my data to .mdb format to train and test the model?

I would greatly appreciate your guidance on this.

Thank you.

csguoh commented 3 months ago

Hi,

Thanks for your interest in this work. In our LEMMA, the PositionAware model adopts the attention map extracted from the OCR model ABINet. If you want to simply test our LEMMA, this part of parameters have already contained in the released LEMMA-release.pth. If you want to train on your own dataset, you can use the released model weights from the ABINet repo with this link.

If you want to train from scratch, it is recommended to first convert your dataset to mdb for fast io. If you just want to test on jpeg image, it might to write some code to allow jpeg format input.

Let me know if you have any other questions.

Bhavik-Ardeshna commented 3 months ago

@csguoh 

Thank you for your prompt reply.

Now, I was able to run the test script. I have one question: after training the model with my dataset, do we need all the model files that were used for testing in order to deploy the model for inferencing?

What would be the estimated size of the model that I can deploy for inferencing?

Thank you.

Bhavik-Ardeshna commented 3 months ago

Hi,

Thanks for your interest in this work. In our LEMMA, the PositionAware model adopts the attention map extracted from the OCR model ABINet. If you want to simply test our LEMMA, this part of parameters have already contained in the released LEMMA-release.pth. If you want to train on your own dataset, you can use the released model weights from the ABINet repo with this link.

If you want to train from scratch, it is recommended to first convert your dataset to mdb for fast io. If you just want to test on jpeg image, it might to write some code to allow jpeg format input.

Let me know if you have any other questions.

@csguoh I used the LEMMA-release.pth file for PositionAware, but it was giving some errors. So, I used ABINet vision and language weights.

I am wondering: If I want to write an inference script, I only have to use the LEMMA-release.pth file, right?

csguoh commented 3 months ago

Exactly! The released LEMMA-release.pth file contains all the module weights you need, including the fine-tuned ABINet weights.