-
При парсинге документа поймали джавовый ООМ:
`2024-08-27 16:25:07,893 - /dedoc_root/dedoc/dedoc_manager.py - INFO - Get file tmpggtlpvfp.pdf with parameters {'document_type': 'diploma', 'structure_…
-
A workflow could have binary files, such as compiled code, images, etc.
TRS does not have a way to serve up binaries; the [FileWrapper](https://github.com/ga4gh/tool-registry-service-schemas/blob/dev…
-
-
Since `AlternativeImage` has been introduced on every level of the structural hierarchy, these image files can be used to represent results from image preprocessing (normalization, denoising, binariza…
-
sometimes when you want to scan something, the pictures are not square, and that causes lots of issues with detect content.
i suggest that we have a tool similar to GIMP's perspective tool to correct…
-
### Environment
* **Tesseract Version**: tesseract 4.1.1-rc2-21-gf4ef
leptonica-1.78.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0…
-
https://github.com/tmbdev/teaching-dca
Thomas_Breuel 开授的课程
1.转换成pdf
2.pdf转换成html
3.翻译
-
I have looked at the contours. I wanted to overlay the the contours by shape classes but actually failed to do that since I could not flip them all the right way. But I have scaled them roughly withi…
-
Opening a new issue as requested.
Here are some samples: https://mega.nz/folder/BRhChKob#xo-HHaJrD9VYN6YV3ur9WA
128.tif & 188.tif - original cleaned up 600dpi scans
*-scantailor.tif - 600dpi mi…
-
We should have heuristics to check for
- polygon containment (overlapping regions, word outside line etc.)
- artifacts from annotation like point or line-like regions
- lines with (way) too much …