Open reisingerf opened 1 month ago
Yes, 'name' will be an idempotent key, both in the WES DB and specified on the ICAv2 API Call
Yes, that's a good first start. It could handle the case where the same event/request is submitted multiple times (probably accidentally be the same source).
However it would not reliably prevent the same thing being run multiple times. E.g. to avoid the execution if it would not result in new output.
Longer term I was thinking more on the data that affects actual computation. For example if two different versions of the same execution manager were to run in parallel (say during a transition period) and they would submit the same payload/request, but with slightly different names. Or where two parties try to run the same thing (imagine a submission button via the UI triggered by two curators more or less at the same time).
Not urgent, as I don't think this will happen very often anyway. It would probably also need an "override" flag for those cases where the exact same thing is deliberately meant to run again (e.g. first execution failed due to intermittent/transient error). So I'd see that advanced dedup as possible future improvement rather than a requirement. Just wanted to record it...
I'm not sure that this sits within the responsibility of the icav2 wes manager but more so the workflow manager.
In the UI trigger example above, I think that's the responsibility of the workflow manager that relayed two events with different portal run ids, but identical input bodies to prevent that from happening. If two separate requests come in with identical payloads but with different names (say different portal run ids, I think a deduplication here would cause a lot more complications than it resolves). Do you just run it once but copy the outputs to the two distinct portal run id locations once one of them finishes? Or do you fail the latter one? Unless it has the override flag?
I was considering making a similar API endpoint for our Nextflow Batch service, again if the responsibility of handling just duplicate input parameters is put on the microservice, that's duplicating a lot of code AND putting a lot of irrelevant business logic in the microservices themselves. If there is assurance that the workflow run name is idempotent then it should be on the service that orchestrated the requests to ensure that these requests aren't duplicates
Yeah, good point.
I don't think that we can guarantee the deduplication can always happen upstream, but I can accept that the responsibility for it is not something this service needs to burden itself with. It should not happen often enough to cause any issues.
For the case where two versions of an execution service run in parallel, I think there would be other mitigation strategies.
In an event driven architecture it's not unthinkable that the same event is send multiple times. Ideally we'd want the "same" job requests handled transparently while running identical jobs unnecessarily is avoided.