OCR-D / ocrd_ocropy

OCRD CLI to ocropy
Apache License 2.0
2 stars 1 forks source link

ocrd-ocropy-segment throws an exception for a lot of workspaces #4

Open mikegerber opened 5 years ago

mikegerber commented 5 years ago

For a lot of different data, ocrd-ocropy-segment throws an exception. Here for 5 of the files from the OCR-D GT repo:

# zips all from https://ocr-d-repo.scc.kit.edu/api/v1/metastore/bagit                                                   
for z in benner_herrnhuterey04_1748.ocrd.zip buerger_gedichte_1778.ocrd.zip estor_rechtsgelehrsamkeit02_1758.ocrd.zip lohenstein_agrippina_1665.ocrd.zip siles
  echo "== $z"                                                                                                                                                
  cd `mktemp -d`                                                                                                        
  cp /srv/data/OCR-D/$z .                                                                                               
  dtrx $z                                                                                                               
  cd ${z//.zip}/data                                                                                                    

  ocrd-ocropy-segment -l DEBUG -m mets.xml -I OCR-D-IMG -O OCR-D-SEG-LINE 2>&1 | tail -n 1                              
done

yields:

== benner_herrnhuterey04_1748.ocrd.zip
15:13:48.505 INFO ocrd.workspace - Saving mets '/tmp/tmp.NffpG878nI/benner_herrnhuterey04_1748.ocrd/data/mets.xml'
== buerger_gedichte_1778.ocrd.zip
ValueError: cannot convert float NaN to integer
== estor_rechtsgelehrsamkeit02_1758.ocrd.zip
ValueError: cannot convert float NaN to integer
== lohenstein_agrippina_1665.ocrd.zip
ValueError: cannot convert float NaN to integer
== silesius_seelenlust01_1657.ocrd.zip
15:14:01.768 INFO ocrd.workspace - Saving mets '/tmp/tmp.26Cn1zFHby/silesius_seelenlust01_1657.ocrd/data/mets.xml'
% pip list | grep ocrd-ocropy
ocrd-ocropy                0.0.3