OCR-D / core

Collection of OCR-related python tools and wrappers from @OCR-D
https://ocr-d.de/core/
Apache License 2.0
118 stars 31 forks source link

Utilize processing server proxy to mets servers #1220

Closed MehmedGIT closed 4 months ago

MehmedGIT commented 5 months ago

Allow the Processing Server to accept general mets server-related TCP requests, and translate them to UDS requests. This feature is useful for cases when the worker is located on a remote host and wants to communicate with a UDS Mets Server. One main benefit of this approach is to avoid allocating separate ports for different mets servers.

~This PR is still a draft and the implementation is still on a conceptual level~. Please feel free to suggest any ideas.

MehmedGIT commented 4 months ago

I still don't fully understand: if the METS Server user (here: tcp_mets caller) is remote (relative to the workspace), then how is this useful? All the files that the METS updates will relate to will also be remote, and we https://github.com/OCR-D/core/pull/966#pullrequestreview-1261544355 earlier.

The idea is that the processing workers could send requests to any Mets server through the Processing Server without allocating separate ports per Mets server on the host where the Processing Server is running.

You are right that it will not work when the Processing Server (Mets Servers) and the Processing Workers are on different hosts with the current setup. You are also right that it has been decided to not transfer files over to the Mets Server as discussed in https://github.com/OCR-D/core/pull/966.

The forwarding through the Processing Server as a proxy is supposed to be used when:

@joschrew, is there anything more to add that I have missed?

joschrew commented 4 months ago

I imagine a setup where workers (and processing server) are on different vm's. The workspace is shared through NFS so that every processor has access to the same files. Currently this would not work with the Mets-Server as the unix-domain-sockets cannot be shared through NFS. With this PR it should be possible for workers on different vms to make requests to the Mets-Server.

bertsky commented 4 months ago

Thanks @MehmedGIT @joschrew for the explanation. Absolutely makes sense now – fantastic idea!

joschrew commented 4 months ago

I did some commits. It is now working for me with processors in docker-containers. I still want to change little things and do additional tests. After that I would change the pr's status for reviews.

MehmedGIT commented 4 months ago

I did some commits. It is now working for me with processors in docker-containers. I still want to change little things and do additional tests. After that I would change the pr's status for reviews.

Thanks.

MehmedGIT commented 4 months ago

It is still unclear to me why one of the processors sent the request to /tcp_mets/workspace_path instead of /tcp_mets, leading to 404 errors. However, the error I was getting was related to an outdated ocrd_all local installation. All redirections work as expected with the latest version of ocrd_all and core.