Closed davidbernat closed 1 year ago
Several updates:
Is you can remedy these please do. Thanks.
Hi @davidbernat,
Thank you for your interest in our work!
It's a known problem that the scene text images are very different from text in document and they don't transfer very well to each other, and unfortunately the public datasets we are using are mostly scene text. If your application needs to deal with documents, it would be better to train a different model using the document datasets.
For the issue of not returning any words, it's usually because the config doesn't match the model itself which messes up the character mapping. Could you double check if the config in README.md doesn't work with the weights?
That surprises me. The OCR worked so impressively well for scene text of varying size and scene noisiness. The use of a book photo, when tightly cropped and contiguous worked well at times and not at all at others. The irregularity of the performance on multiple repeat runs was surprising. The minimal transfer learning also surprises me as block text with handwritten notes seem to remove all functionality at times. These do not seem to be behaviors implicit in the architecture so their manifesting in the application of the tool is much of a surprise to me. You all are much smarter than I am and are doing something I could never do. Still the difference between intuition is surprising. I promise I will double check my execution of the code to see whether any other confounding variables could be present.
Regarding the configuration files: they are the same as the GitHub repository. Can you double check that those in the GitHub repository work as you expect and post several examples similar to what I am describing— books from phones and occasional handwritten annotations?
On Tue, Nov 8, 2022 at 6:28 PM Jing Huang @.***> wrote:
Hi @davidbernat https://github.com/davidbernat,
Thank you for your interest in our work!
It's a known problem that the scene text images are very different from text in document and they don't transfer very well to each other, and unfortunately the public datasets we are using are mostly scene text. If your application needs to deal with documents, it would be better to train a different model using the document datasets.
For the issue of not returning any words, it's usually because the config doesn't match the model itself which messes up the character mapping. Could you double check if the config in README.md doesn't work with the weights again?
— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/MultiplexedOCR/issues/10#issuecomment-1307974308, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJFNI4J4EY7H5R5GVMMNB3WHLOZ3ANCNFSM6AAAAAARUQKJMQ . You are receiving this because you were mentioned.Message ID: @.***>
-- David Bernat, Ph. D. text: 917-825-7193 / LinkedIn https://www.linkedin.com/in/david-bernat/ All content should be considered proprietary to Starlight LLC and Confidential or higher
Please let me know when you have examples of this same behavior. Thank you.
On Tue, Nov 8, 2022 at 7:13 PM David Bernat @.***> wrote:
That surprises me. The OCR worked so impressively well for scene text of varying size and scene noisiness. The use of a book photo, when tightly cropped and contiguous worked well at times and not at all at others. The irregularity of the performance on multiple repeat runs was surprising. The minimal transfer learning also surprises me as block text with handwritten notes seem to remove all functionality at times. These do not seem to be behaviors implicit in the architecture so their manifesting in the application of the tool is much of a surprise to me. You all are much smarter than I am and are doing something I could never do. Still the difference between intuition is surprising. I promise I will double check my execution of the code to see whether any other confounding variables could be present.
Regarding the configuration files: they are the same as the GitHub repository. Can you double check that those in the GitHub repository work as you expect and post several examples similar to what I am describing— books from phones and occasional handwritten annotations?
On Tue, Nov 8, 2022 at 6:28 PM Jing Huang @.***> wrote:
Hi @davidbernat https://github.com/davidbernat,
Thank you for your interest in our work!
It's a known problem that the scene text images are very different from text in document and they don't transfer very well to each other, and unfortunately the public datasets we are using are mostly scene text. If your application needs to deal with documents, it would be better to train a different model using the document datasets.
For the issue of not returning any words, it's usually because the config doesn't match the model itself which messes up the character mapping. Could you double check if the config in README.md doesn't work with the weights again?
— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/MultiplexedOCR/issues/10#issuecomment-1307974308, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJFNI4J4EY7H5R5GVMMNB3WHLOZ3ANCNFSM6AAAAAARUQKJMQ . You are receiving this because you were mentioned.Message ID: @.***>
-- David Bernat, Ph. D. text: 917-825-7193 / LinkedIn https://www.linkedin.com/in/david-bernat/ All content should be considered proprietary to Starlight LLC and Confidential or higher
@davidbernat Have you modified the yaml file so that CHAR_MAP.DIR is pointing to the directory containing the character map jsons? I just uploaded an example notebook for your reference: https://github.com/facebookresearch/MultiplexedOCR/blob/main/notebook/inference/demo.ipynb
Please do not close this ticket. None of the code from your repository was modified in my runs, as I stated in my email. Thanks. :-)
Please do not close this ticket. None of the code from your repository was modified in my runs, as I stated in my email. Thanks. :-)
You should at least modify the yaml file so that CHAR_MAP.DIR is pointing to the directory containing the character map jsons (see the readme file in the repo), otherwise it won't work. Let me know if you are able to reproduce the example notebook above :-)
You have no idea how much I appreciate that contributions FAIR continues to make to the next generation of Open Source (OS). I am running MultiplexOCR out-of-the-box on a high-resolution photograph of a high dpi published book of an easily identifiable font. Why is the performance so bad? Why is the performance so much worse than Apple iOS that gets this correct instantly?
Furthermore: I noticed that Multiplex OCR performed very well on non-book published text. Why the divergence? That cannot possibly be built into the model can it? And it would seem unlikely to be reflected choice in the training data.
This application I am working on is very important and could serve the FAIR and Facebook community tremendously. We are days way from execution and this step seems to be the only step holding us back. I do hope you will give us your attention on this.
After all: as they say, 'attention is all you need.' 😉
Also: why does the image not return the text? only the text segmentation.