crocs-muni / sec-certs

Tool for analysis of security certificates and their security targets (Common Criteria, NIST FIPS140-2...).
https://sec-certs.org
MIT License
9 stars 7 forks source link

OCR processing fails #298

Closed dmacko232 closed 1 year ago

dmacko232 commented 1 year ago

When ocr is used to convert pdf to text it fails with error message Error during OCR of, using garbage: invalid literal for int() with base 10: 'mpzvoklker/image-06'

The issue is most likely caused by wrong slicing of string (fname[6:-4] instead offname[-6:-4]) https://github.com/crocs-muni/sec-certs/blob/9d1d44d04532609524fd862697179e179a6ea92c/src/sec_certs/utils/pdf.py#L64

J08nY commented 1 year ago

I do not have time to fix this. fname[-6:-4] is likely wrong as well as triple-digit pagecounts are reasonable. Better do a split on the "-" and the ".".

adamjanovsky commented 1 year ago

Thanks for noticing, I'll try to fix that. When called via convert_certification_report() function (and others), it should be wrapped in try-except block, so I don't see it as critical. But I'll look into it.

adamjanovsky commented 1 year ago

I drafted a fix:

https://github.com/crocs-muni/sec-certs/pull/299/files#diff-868e915618fb558aec444f32bf8f70056ffe499f0cb2f71c6d9fa75a4cacc005

works on my machine :)