facebookresearch / nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents
https://facebookresearch.github.io/nougat/
MIT License
8.81k stars 560 forks source link

Please advise approach to increase Nougat ocr quality for small size super subscript characters #187

Open grigbpi opened 9 months ago

grigbpi commented 9 months ago

Attached page PDF normal pdfwithtext.pdf In equation below small superscript letter font-size 5.8 pt tagged x{n}^{(n)} instead of x{u}^{(n)}

[\begin{bmatrix}x{n}^{(n)}\ y{n}^{(n)}\ z{n}^{(n)}\end{bmatrix}=\begin{bmatrix}r{n}^{(n)}s{\theta^{(d)}}c{\phi^{( d)}}\ r{n}^{(n)}s{\theta^{(d)}}s{\phi^{(d)}}\ r{n}^{(n)}c_{\phi^{(d)}}\end{bmatrix},\,n\in{1,\cdots,N}. \tag{3}]

PDF is two column layout academic papers pdfwithtext.pdf with significant number of super, subscripts, some as small as 4.2 pt Please advise approach to increase ocr quality for these small size super subscript characters Normal size characters ocr is good. Thank you