issues
search
UB-Mannheim
/
ocr-fileformat
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
https://digi.bib.uni-mannheim.de/ocr-fileformat/
MIT License
176
stars
23
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Challenges processing textract
#187
joewiz
opened
3 weeks ago
6
update textract2page (v 0.2 - full LAYOUT etc.)
#186
bertsky
closed
2 months ago
4
ocr-transform hocr text:  is an invalid XML character
#185
jbarth-ubhd
closed
2 months ago
2
ocr-transform alto hocr: HTML, but xmlns=xhtml
#184
jbarth-ubhd
opened
4 months ago
2
page to hocr: cr_carea vs ocr_carea
#183
jbarth-ubhd
closed
4 months ago
12
Add citation file (fixes #179)
#182
stweil
opened
4 months ago
0
[feature request] Support TSV format
#181
stweil
opened
6 months ago
0
update textract2page (for valid @conf ranges)
#180
bertsky
closed
6 months ago
1
Missing CITATION.cff file for repository
#179
mhucka
opened
6 months ago
1
Broken badge on repo
#178
mhucka
closed
6 months ago
2
update textract2page
#177
bertsky
closed
6 months ago
0
`make all` wants to write to `PREFIX`
#176
stweil
opened
7 months ago
0
Installation with sudo writes local files with root ownership
#175
stweil
opened
7 months ago
0
Update Dockerfile, fix #173
#174
kba
closed
7 months ago
0
Docker installation
#173
yuvaler1
closed
7 months ago
1
update vendor/page-to-alto v1.2.0 -> v1.3.0
#172
kba
closed
8 months ago
0
update textract2page to include slub/textract2page#13
#171
kba
closed
9 months ago
0
Add transformation from hOCR to TEI and update transformation matrix
#170
stweil
closed
9 months ago
0
Use first bash from PATH (allows running on macOS)
#169
stweil
closed
10 months ago
0
Replace broken Travis CI by GitHub action
#168
stweil
closed
1 year ago
4
Fix broken conversions from hOCR to ALTO
#167
stweil
closed
1 year ago
1
update textract2page, hOCR-to-ALTO and alto-schema
#166
kba
closed
1 year ago
3
Update Makefile to support macOS
#165
stweil
closed
1 year ago
0
Table extraction
#164
kba
opened
1 year ago
0
add PRImA converter for GCV→ALTO
#163
bertsky
closed
1 year ago
0
ensure venv for Python tools
#162
bertsky
closed
1 year ago
1
Fix two issues reported by CodeQL CI
#161
stweil
closed
1 year ago
1
Add textract2page
#160
bertsky
closed
1 year ago
6
Add example files
#159
nichtich
opened
1 year ago
0
make install: use newline in sed c cmd
#158
bertsky
closed
1 year ago
1
Feature request: Page concatenation during conversion
#157
jsbien
opened
1 year ago
0
gcv__page: use -source-json instead of -source-xml
#156
bertsky
closed
1 year ago
2
Add CodeQL workflow for GitHub code scanning
#155
lgtm-com[bot]
closed
1 year ago
0
vendor/Makefile: page-to-alto is phony
#154
bertsky
closed
1 year ago
2
regression: page-to-alto is missing
#153
bertsky
closed
1 year ago
6
update page-to-alto
#152
bertsky
closed
1 year ago
1
page to text: rewrite
#151
bertsky
closed
1 year ago
1
[feature request] Support MacOS
#150
stweil
closed
1 year ago
13
Update SaxonHE to version 11.2
#149
stweil
closed
1 year ago
1
Use git submodules
#148
stweil
closed
2 years ago
2
Conversion from ABBYY to ALTO
#147
kba
closed
2 years ago
2
when converting to PAGE, always use latest schema
#146
bertsky
closed
2 years ago
1
page page2019: does not work
#145
bertsky
closed
2 years ago
0
Update Saxon-HE
#144
stweil
closed
2 years ago
2
page__alto transformation mixes XML with logging in the output
#143
bertsky
closed
2 years ago
2
page__alto: process all arguments
#142
bertsky
closed
2 years ago
3
[doc][fix] clear README cli links
#141
M3ssman
closed
2 years ago
1
Add ImageWare MyBib to ALTO conversion by karkraeg, fix #139
#140
kba
closed
2 years ago
3
Transformation for ImageWare MyBib
#139
karkraeg
closed
2 years ago
2
page__text.xsl is not honoring the reading order
#138
mikegerber
closed
1 month ago
8
Next