Open kba opened 1 year ago
Thank you @kba! I have decided even against the --type=worker/server
option because it shadows the type
in Python... Instead, there are --agent_type=worker/server
and --agent_address
(when type is server) options. Check the #1030 implementation to see the currently used CLI syntax. I would suggest waiting on this topic till finishing #1030 for a review, then the CLI syntax will be revisited/refactored.
Some updates to this topic that are based on the current state of #1030.
Network modules/agents based on the WebAPI spec:
ocrd_network
yet)How are the network modules started, steps with example usage:
ocrd network processing-server /path/to/processing-server-config --address <server-address>
ocrd-* --type worker --database <address> --queue <address>
Processor Servers:
ocrd-* --type server --database <address> --address <server-address>
Of course, the other variants are supported as well, i.e.:
ocrd network processing-worker <processor-name> --database <address> --queue <address>
ocrd network processor-server <processor-name> --database <address> --address <server-address>
How is the client supposed to be used? There is no clear agreement yet, so consider everything that follows as my initial naive ideas! Before continuing with the Client CLI implementation, we need to decide what to support and what arguments/options to provide.
Currently, as part of #1030, only the processing/processor
endpoint is (partially) implemented:
ocrd network client processing processor <processor-name> --address <processing-server-address> ... arguments/options ...
where the arguments and options are the well-known ones for the specific ocrd-processor. There are also some extensions (such as --result-queue-name
and --callback-url
) to allow more flexibility (when the client is another machine).
Then the processing status of a job can be checked with:
ocrd network client processing status <job-id>
. The job-id here is returned as a response to the previous request.
Similar to the example above, the client can support:
ocrd network client <workspace/workflow/discovery> ... arguments/options ...
To keep the client CLI usage simple, the addresses of the servers can be configurated through environment variables.
My current suggestion is the OCRD_NETWORK_SERVER_ADDR_*
pattern where *
can be one of PROCESSING/WORKSPACE/WORKFLOW
. Clearly, we need more thoughts here for the standalone Processor Servers.
That's all from my side for now.
Regarding the command line client, IMO it should be as consistent with the existing CLIs as possible. And I would prefer names of operations instead of HTTP mnemonics (POST/GET/PUT), for example:
ocrd network client discovery installation|processors|resources|...
(or whatever will be on /discovery
)ocrd network client workspace list|download|upload|remove
ocrd network client workflow list|download|upload|run
where the latter should look like ocrd process
(i.e. block)ocrd network client processing process <executable> ...
looking like ocrd-<executable> ...
(i.e. block)So no status
and no asynchronous / polling. (And that's independent of whether we make the Processor Server API itself blocking or not... the client could still implement a polling loop or callback or whatever is necessary.)
it should be as consistent with the existing CLIs as possible. And I would prefer names of operations instead of HTTP mnemonics (POST/GET/PUT)
Agree. I am also not a big fan of the HTTP mnemonics for the CLI.
Worth noting that the list
option can get big really fast if the listing is not done just on user level when there are many users.
Worth noting that the
list
option can get big really fast if the listing is not done just on user level when there are many users.
Yes, but adding user management like fastapi-users should not be difficult, IMO this is out of scope for core.
@MehmedGIT we currently have the ocrd network client processing processor NAME --address ADDR --agent-type=worker
(via publish_to_queue
asynchronously) and --agent-type=server
(via push_to_processor_server
synchronously). The latter also uses the Processing Server – but shouldn't the client in that case try to connect to the Processor Server directly? Or would that be another independent client command (say ocrd network client processor ADDR
)?
(Background: @joschrew in https://github.com/OCR-D/ocrd_all/pull/386 started implementing the ocrd_all-based deployment with docker compose for servers and the client CLIs – currently for the Processing Server model only. For the Processor Server model it would not make sense to use the Processing Server at all, but we have no client CLI yet.)
Or would that be another independent client command (say ocrd network client processor ADDR)?
It would be rather that to allow both ways.
For the Processor Server model it would not make sense to use the Processing Server at all, but we have no client CLI yet.
Right, the client CLI should be extended to support that.
From https://github.com/OCR-D/core/pull/974#issuecomment-1483588724
@MehmedGIT: