amazonlinux / amazon-linux-2023

Amazon Linux 2023
https://aws.amazon.com/linux/amazon-linux-2023/
Other
508 stars 38 forks source link

[Package Request] - tesseract #534

Open ahoward-ch opened 8 months ago

ahoward-ch commented 8 months ago

What package is missing from Amazon Linux 2023? Please describe and include package name. package: tesseract description: ocr package containing many language models

Is this an update to existing package or new package request? New package request

Is this package available in Amazon Linux 2? If it is available via external sources such as EPEL, please specify. tesseract is available in Amazon Linux 2 via amazon-linux-extras

Any additional information you'd like to include. (use-cases, etc) Vital for creating containers to handle small scale ocr in batches.

stewartsmith commented 8 months ago

Specifically, this is available via EPEL (https://src.fedoraproject.org/rpms/tesseract shows 3.04 in EPEL)

ahoward-ch commented 8 months ago

I didn't seem able to get EPEL in amazon-linux-2023, but that might simply be because I dont know how.

Or are you simply adding context that tesseract 3 is available in EPEL as an extension of the 3rd answer in the original post?

For what its worth, tesseract 3 is wildly out of date - tesseract 4 and above switched to using far superior language models and really should be the minimum requirement for tesseract now. Tesseract 4 is what is available in Amazon Linux 2 extras so a like for like would really be needed.

schembor commented 7 months ago

For what its worth, tesseract 3 is wildly out of date

Yes - Tesseract 3 isn't even compatible with most packages which are used to interact with tesseract, like pytesseract. Also was unable to get EPEL in AL 2023.

stewartsmith commented 7 months ago

The note about it being in EPEL was to differentiate this from a package in AL2 that isn't in AL2023.