Web-API start with processor part

OCR-D / zenhub

Repo for developing zenhub integration

Apache License 2.0

0 stars 0 forks source link

Web-API start with processor part #118

Closed joschrew closed 2 years ago

joschrew commented 2 years ago

In this sprint I want to get started with the processor part of the web-API. Therefore I want to:

Read and understand Triet's Processing-Server implementation in ocrd-core (currently this is a PR: https://github.com/OCR-D/core/pull/884)
Think about and try out some things about how to use that for the web-API implementation

joschrew commented 2 years ago

Status: I have gone through the code of the mentioned pull-request and I think I now know what it is about.

From my understanding it would be best to run every processor in an own docker-container (later maybe kubernetes or something else). Than in the webapi either use something like traefik to forward requests on the processor-part of the api to this containers. Or forward the requests with fastapi (maybe with something like mentioned here: https://github.com/tiangolo/fastapi/issues/1788). The first (forward with proxy like traefik) seems to me the better solution. I think it would also be possible (but not intended and not as usefull) to create fast-api endpoints and use ocrd/server/main.py ProcessorAPI.process() with this endpoints.

But currently it is more important to improve the existing api and integrate/improve the workflow-part of the webapi which Mehmed provided. So I didn't start implementing anything regarding the processor.

joschrew commented 2 years ago

We talked about the processing part of the webapi on Monday (26.9.2022). In the webapi we have endpoints for the processing-server. Processing requests can be accepted and the users can query status-information(running, stopped) about processors they started and receive logs etc. In ocrd we already have a pull request about a processing server (https://github.com/OCR-D/core/pull/884). This should be used by the webapi. For every processor to be offered a server will be started. What I want to implement is the functionality in the webapi which can manage/talk to these processing-Servers. The webapi should forward the requests to start a processor for example, to these servers which will execute the processor. And the webapi should be able to provide logs about the running processes and should be able to forward their status to the user. We called this part of the webapi the processing broker. To come closer to this implementation is my goal for this sprint.

joschrew commented 2 years ago

In the last sprint I started with the processing broker with the endpoint run_processor. This endpoint accepts a processor name and parameter for its invocation. This request is then delegated to the processing server. Which processors are available and where to reach them (URL) is provided in a config-file. The processing-servers are based on ocrd-core currently on the processing-server pull request. Next step is to implement the other processing related endpoints.

joschrew commented 2 years ago

closing comment: startup is made. implementation the processor part will be continued here: https://app.zenhub.com/workspaces/ocr-d-board-61b071467917f10021c2582a/issues/ocr-d/zenhub/133 https://app.zenhub.com/workspaces/ocr-d-board-61b071467917f10021c2582a/issues/ocr-d/zenhub/134