frmscoe / General-Issues

This repo exists to track current work and any issues within the FRMS CoE
2 stars 0 forks source link

Align processing of the network map with the NATS implementation #295

Closed Justus-at-Tazama closed 3 months ago

Justus-at-Tazama commented 9 months ago

Story statement

As an Tazama system operator, I want to use the network map to dynamically change the routing of transactions for evaluations, So that evaluation routing can be configurable during operation and not just at deployment.

Acceptance criteria

  1. Define both NATS publishing and subscription subjects in the network map for each processor
  2. Determine the routing of the transaction based on the publishing and subscription values in the network map
  3. Account for the processor version in the routing by embedding the version in the NATS subject

Background

Before we implemented NATS, the network map contained a URL for each of the processors in the chain, with the exception of the CRSP since it was pretty much a core processor and routing for the CRSP was embedded into the platform deployment. I'm guessing the CRSP was addressed out of an environment variable. Anyway, when we implemented NATS, processors were no longer addressed through a URL, but were rather addressed via the NATS subject where they got their input from. In the network map, the idea was that the network map would inform the destination of a particular evaluation. The reasoning was that you could, in theory, have multiple parallel processors available that were tailored to a specific transaction type, channel, typology or rule. In other words, pacs.008 transactions could be routed to TADProc A that had specialised processing for pacs.008, and pacs.002 could be routed to TADProc B that had specialised processing for pacs.002, etc.

The platform isn't actually deployed like that at the moment, and I think because of the straightforward routing to the "core" processors, we forgot about the additional flexibility in the design.

Mind you, a similar problem exists in the rule processors. In theory, we should be able to deploy and maintain multiple versions of the same rule processor and we should be able to route transactions to any version of a rule processor by specifying the version of the rule processor in the network map.

So, here's the rub: the network map has effectively "hard-coded" the destinations of each processor's input subject. In some cases, the network map has no influence on the routing at all, especially in the case of the core processors. That subject is specified somewhere else (I'm not entirely sure where, but I'm assuming it's in an environment variable somewhere). And also somewhat related, the 'exit' point of any processor is not specified at all and solely left up to the configuration of the processor.

I'd like to chat about how we can reinstate the configurability of the routing from the network map again. I think there are now some new and different challenges based on us using NATS: for example, the routing is actually established when the processors are initialised and no longer dynamic based on when a network map is received. And, in our current process, when a processor is initialised, it doesn't have access to the network map nor does it know to check it.

And to be fair, the original design was also flawed - it didn't make use of the processor versioning either; the versioning was "implied" in the URL. basically OpenFaaS used to deploy every version of every processor to the same URL. Some of the processors (typology processor and CADProc, for example) actually didn't even bother checking the network map for the next processor's URL - it was hardcoded in the processor. The CRSP and the rule processors were checking the network map which was how we were able to intercept the output by updating the network map URL to a dummy endpoint.

With the implementation of NATS, we may have given up some of the system's operational dynamism, but still left the implementation of the mechanisms for that dynamism in place, so we're somewhere in the middle between two solutions.

With reference to the third acceptance criterium, this will pose some challenges as we are in something of a chick/egg situation. If the knowledge that a processor requires to subscribe to its predecessor's publishing subject is to be derived from the network map, how will this processor be able to set up the subscription in the first place to receive the network map? This paradox implies that the processors will need to interrogate the network map when they are deployed, and not not when they receive the network map. And worse, the processor won't be able to change its subscription based on new information from the network map because the network map will be directed to the processor using a new publishing subject that it won't yet have access to.

Justus-at-Tazama commented 3 months ago

Epic - implemented.