Create language based Dockerimages

LeoFCardoso / pdf2pdfocr

A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!

Apache License 2.0

266 stars 33 forks source link

Create language based Dockerimages #24

Closed Brice187 closed 3 years ago

Brice187 commented 3 years ago

Hey, could you please create Dockerfiles for different languages and and upload the tagged images to the Docker Hub?

Alternatively you could add all tesseract ocr language packages to the Dockerfile, but this would nearly triple the image size:

larsk@MacBook-Pro pdf2pdfocr % docker image ls
REPOSITORY                           TAG                                              IMAGE ID            CREATED             SIZE
pdf2pdfocr                           all-lang                                         a74b8d22d02b        6 seconds ago       1.1GB
pdf2pdfocr                           latest                                           09eccd997dd3        6 minutes ago       417MB

Should I add a PR for this issue?

LeoFCardoso commented 3 years ago

Hi there. Yes, you're right as adding languages can increase the resulting docker image.

Maybe we could specify some parameter to the container to point to a directory with languages. In this case, container may be dependent from the host operating system anyway.

I think it's better to let user build own images just editing the Dockerfile.

What's your idea about a PR?

Brice187 commented 3 years ago

This PR ideas came to my mind

Just submit a Dockerfile.lang-deu ;)
With some sed magic, I could add Dockerfiles for every lang.
change ubuntu to alpine to get a smaller image (maybe with all lang)

But for now, my use case is fulfilled. Thank you for your work!

LeoFCardoso commented 3 years ago

Good idea! Thank you for posting the issue.