OCR-D / core

Collection of OCR-related python tools and wrappers from @OCR-D
https://ocr-d.de/core/
Apache License 2.0
118 stars 31 forks source link

How to get `ocrd-tool.json` if processors not installed in processing server? #1034

Open kba opened 1 year ago

kba commented 1 year ago

image

@tdoan2010:

This implementation requires that all supported processors must be installed on the same machine with Processing Server as well, which might not be the case. Maybe after integrating #884, we can send requests to each processor to ask for its information instead.

@bertsky:

I concur – see earlier discussion above.

@MehmedGIT:

Maybe after integrating #884, we can send requests to each processor to ask for its information instead.

The processing worker is not a server anymore to send requests there. I still have no clear idea how to achieve that. The best idea I have found so far is to store the ocrd tool jsons in the DB so the Processing Server can retrieve the information from there.

bertsky commented 1 year ago

Discussion continued as follows:

So we seem to agree that all workers ( / processor queues) should be registered ( / created) centrally on the Processing Server (via endpoint or from configuration at startup), and that new Processing Workers should output their ocrd-tool.json immediately, so that can be used by the registration to store all JSONs in a tool cache dynamically.

bertsky commented 1 year ago

BTW I believe for the full Web API including /discovery, we would need central worker registration anyway.

MehmedGIT commented 1 year ago

So we seem to agree that all workers ( / processor queues) should be registered ( / created) centrally on the Processing Server (via endpoint or from configuration at startup)

This will come after #1030. #1030 will already be big to handle that change inside as well. What I currently have in mind for the near future is:

Any other suggestions/modifications?

MehmedGIT commented 1 year ago

Ideas for a bit later in time (not even sure for when):

Disclaimer: Potentially, this will be too time-consuming to implement and cause errors without having good automatic testing mechanisms for the entire network and agents working together.

MehmedGIT commented 1 year ago

BTW I believe for the full Web API including /discovery, we would need central worker registration anyway.

True. We still need to think about how exactly this should happen - i.e., which network agent takes responsibility to handle the central registration. This is currently the Processing Server.

bertsky commented 1 year ago

BTW I believe for the full Web API including /discovery, we would need central worker registration anyway.

True. We still need to think about how exactly this should happen - i.e., which network agent takes responsibility to handle the central registration. This is currently the Processing Server.

Yes, it makes most sense there, because the Processing Server is the one that needs to know who to talk to anyway. So via registration it has the ultimate truth on processor_list etc. and could provide its own /discovery, which can be delegated to by the Workflow Server's /discovery.

Deployments should also be backed by the database BTW, in case the PS crashes...

MehmedGIT commented 1 year ago

Yes, it makes most sense there, because the Processing Server is the one that needs to know who to talk to anyway.

For processing, yes. What if the /discovery needs to be extended? Say the client wants to discover available Workspace/Workflow servers. Then the Deployer has the central knowledge of where things were deployed.

Deployments should also be backed by the database BTW, in case the PS crashes...

Agree.

MehmedGIT commented 1 year ago

the DeployerConfig will potentially be extended to be able to deploy Workflow Server and Workspace Servers (in the reference WebAPI impl) as well

the Deployer agent will deploy RabbitMQ Server, MongoDB ...

These are no longer valid... The RabbitMQ Server, MongoDB, Workflow Server, and Workspace Server will be deployed with docker-compose.

@tdoan2010

tdoan2010 commented 1 year ago

I don't know how the discussion went to this topic, which is completely not relevant to the title of this issue. But yes, the Processing Server will only be responsible for Processor Servers. The rest must be managed by another way outside the Processing Server.

The final goal is to have a docker-compose file, which can be used to start up all necessary components.