hocr Search Results - Githubissues

1000+ results
for hocr

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pdfminer/pdfminer.six #832

New hOCR renderer renders duplicate HTML IDs

**Bug report** Thanks for finding the bug! To help us fix it, please make sure that you include the following information: - A description of the bug - Steps to reproduce the bug. Try to mini…

slbayer updated 1 year ago
1
kba/hocr-spec #66

Logical Tags/classes

I don't understand how the logical tags in hOCR should be used. Moreover, I see potential conflicts with other nested tags from the layout. AFAIK ocropus itself does not use any logical tags and tesse…

zuphilip updated 7 years ago
7
ocropus/hocr-tools #182

Link to hOCR spec doesn't work

No 404, just doesn't open

Zireael07 updated 1 year ago
1
ocropus/hocr-tools #170

decodebytes() Depreciated in hocr-pdf use decodestring()

``` /home/muneeb/.local/bin/hocr-pdf:134: DeprecationWarning: decodestring() is a deprecated alias since Python 3.1, use decodebytes() uncompressed = bytearray(zlib.decompress(base64.decodestring(…

UBISOFT-1 updated 1 year ago
4
internetarchive/archive-pdf-tools #63

HOCR rendering compares unfavorably with tesseract PDF text …

Using recode_pdf (internetarchivepdf 1.5.2) and tesseract (5.3.0). I have three examples single-pages, where I: 1. have tesseract make a full PDF from OCR, via eg `tesseract identifier.tiff i…

jrochkind updated 1 year ago
11
dbmdz/mirador-textoverlay #4

Use baseline information for improved text rendering

ALTO [supports a `@BASELINE` attribute](https://github.com/altoxml/schema/issues/32) that can define a polyline on which the text rests. [hOCR also includes support](http://kba.cloud/hocr-spec/1.2/#ba…

jbaiter updated 3 years ago
3
dbmdz/solr-ocrhighlighting #174

Can't index hOCR documents on Windows

Some hOCR can't be parsed (0.6.0 version) becasue they use diacritics chars in content. For example chars: "**ůá**" words: **aráme, ků** Ex hOCR file: ``` …

petr-fleischmann updated 3 years ago
4
HazyResearch/pdftotree #109

Loss of information oftentimes in the last line of a table

**Describe the bug** I've tried the plain `pdftotree` command line utility on a few pdf files with tables, and found wherever there is a table structure, the last line is usually not captured in the …

linM24 updated 3 years ago
7
kermitt2/pdfalto #44

ALTO version with latest release

Previously, we used pdfalto to generate an ALTO XML from the pdf and https://github.com/filak/hOCR-to-ALTO to convert the ALTO XML to hOCR file after that. With the newest release of pdfalto this does…

ghost updated 3 years ago
6
ocrmypdf/OCRmyPDF #1250

[Feature]: Integrations with other backends via hOcr (naive …

### Describe the proposed feature Hi, I see there are a few issues on the board proposing integrations of new backends. I wondered how difficult this would be to do naively: it turns out that'…

coffepowered updated 7 months ago
4

上一页 1...13 14 15 16 17 18 19...100 下一页

1000+ results for hocr

1000+ results
for hocr