Closed DamonsJ closed 2 years ago
Thank you for reporting this -- it can be easily resolved by reconfiguring the models hyperparameters, and one example is: https://github.com/allenai/VILA/blob/96cafe591ae6ee8a70f941a52dd37bbe0a60b243/datasets/s2-vl-utils/vision_model_loader.py#L140 .
Thank you for reporting this -- it can be easily resolved by reconfiguring the models hyperparameters, and one example is: https://github.com/allenai/VILA/blob/96cafe591ae6ee8a70f941a52dd37bbe0a60b243/datasets/s2-vl-utils/vision_model_loader.py#L140 .
Hi, thanks very much for replying I just want to recognize text, figure and table from published document. how should I adjust the parameters? when I use the extra config in :https://github.com/allenai/VILA/blob/96cafe591ae6ee8a70f941a52dd37bbe0a60b243/datasets/s2-vl-utils/vision_model_loader.py#L140 .
I can recognize text , figure , but math equation can not be recognized.
Thanks!
There's a separate model https://github.com/Layout-Parser/platform/issues/20 which can be used for detecting equation regions. Also see the code here https://github.com/allenai/VILA/blob/96cafe591ae6ee8a70f941a52dd37bbe0a60b243/datasets/s2-vl-utils/vision_model_loader.py#L150
I got bad result using layout-parser here is the image I am used:
here is the code run in python :
here is the result:
by the way :
there is some warning generated :
/usr/local/lib/python3.9/site-packages/detectron2/structures/image_list.py:99: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). max_size = (max_size + (stride - 1)) // stride * stride /usr/local/lib/python3.9/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]