Messaging model between API server and packaging tool

richardrodgers commented 7 years ago

Proposal for connecting the API server with the back-end worker that creates docset packages/archives. Goals include not having state maintained by multiple systems that require synchronization. Software components will hereafter be called 'server' (API server) and 'packager'

All workflow communication between server and packager will occur over message queues (packager can and will call server via its standard API for other information) - no new API endpoints will be defined. There will be a mix of permanent and transient queues to manage the communication. The server will write to permanent queues and read from transient queues, the packager the reverse.

Here is a sample flow (using STOMPish messages):

Server ->

[Omitting CONNECT/CONNECTED handshake]

SEND 
destination: /queue/docsets/package
content-type: text/plain

 http://server/api/docsets/5?fmt=meta,txt
^@

Packager ->

SEND
destination: /queue/package/5
content-type: text/plain

Accepted. 45 seconds
^@

Server -> (at later time when API client polls the dump URL)

SUBSCRIBE
id:0
destination: /queue/package/5
ack:client

^@

Message Queue ->

[MESSAGE/ACK from server] for each message

Server ->

UNSUBSCRIBE
id:0

^@

Packager -> (when archive ready)

SEND
destination: /queue/package/5
content-type: text/plain

Complete: https://foobar.s3.aws.com/package/5
Size: 345345
^@

the /queue/package/5 is transient, and may be removed by configuration of the queue (auto-deleted after a time), or other means. Another possible permanent queue would be /queue/docsets/delete where the server could initiate a deletion (for storage conservation) of packages known to have been delivered, but this is TBD (we may want to allow multiple downloads)

gravesm commented 7 years ago

I'm not sure I understand the server part in the middle. Is it continually subscribing/unsubscribing from the transient queue until it reads the package completed message from the packager?

richardrodgers commented 7 years ago

Not continually - it will only subscribe/read the queue when it receives a request for the package from the original client of the API (i.e the miner who requested the dump). If the client never checks back in, the transient queue is never read by the server, and it gets 'gc'd' eventually. I'm trying to avoid a lot of background tasks, pollng etc.

richardrodgers commented 7 years ago

By the way, the example includes data like packager time estimates that I am not expecting we will do initially (not sure how we'd calculate it) - just suggests the idea of info that could be passed back to the client (other examples would be % complete, etc)

gravesm commented 7 years ago

Okay, yeah, makes sense. We could probably make a reasonable estimate just based on the number of items in the docset.

MITLibraries / tdm-adit_api

Messaging model between API server and packaging tool #16