OCR-D / ocrd_tesserocr

Run tesseract with the tesserocr bindings with @OCR-D's interfaces
MIT License
39 stars 11 forks source link

image_from_page / image_from_segment: Need for workspace? #65

Closed kba closed 5 years ago

kba commented 5 years ago

https://github.com/OCR-D/ocrd_tesserocr/blob/04b6fbc1a94d59dcbe6bd6962c7c31d236b9352c/ocrd_tesserocr/common.py#L170

Can we change the signatures of these methods to avoid relying on the workspace?

AFAICS the workspace is only required to access the resolver for accessing images as PIL Image. Does the convenience of not having to worry about retrieving remote URL and caching images outweigh the benefits of having all these utility methods in ocrd_utils?

Not sure about the consequences but before I investigate further, do you think it would be worth it to have these functions in ocrd_utils rather than as methods of Workspace?

kba commented 5 years ago

Or we could of course keep their signatures as-is and use them as functions rather than methods.

bertsky commented 5 years ago

Or we could of course keep their signatures as-is and use them as functions rather than methods.

Exactly. That would be the only alternative I can see. (Or some variant like passing the workspace's resolver instead of the workspace.)

Does the convenience of not having to worry about retrieving remote URL and caching images outweigh the benefits of having all these utility methods in ocrd_utils?

Yes, I would think so. These functions are, after all, far more complex and Processor-centric than anything in ocrd_utils. Plus this gives the chance of deprecating the use of workspace.resolve_image_as_pil.

bertsky commented 5 years ago

Also goes for save_image_file BTW.

bertsky commented 5 years ago

Closing as OCR-D/core#268 already chose to keep the Workspace method signature.