LinuxForHealth / FHIR

The LinuxForHealth FHIR® Server and related projects
https://linuxforhealth.github.io/FHIR
Apache License 2.0
333 stars 156 forks source link

fhir-notification design can lead to false notifications for failed transactions #3071

Open lmsurpre opened 2 years ago

lmsurpre commented 2 years ago

Describe the bug The current fhir-notification engine is built on top of our FHIRPersistenceInterceptor mechanism. Specifically, we emit the events when we invoke the afterX methods.

The problem with this is that those afterX methods are called before the transaction is committed. Therefor, its still possible for the transaction to fail after the event is fired.

Environment main

To Reproduce Steps to reproduce the behavior:

  1. enable notifications (any impl)
  2. force a transaction to fail AFTER getInterceptorMgr().fireAfterDeleteEvent(event); is called

note that subscibers still receive the event even though the create/update/delete change wasn't committed

Expected behavior events are only fired for interactions that complete successfully

Additional context I think it would be better to fire the events AFTER the transaction is committed. Unfortunately, that means that (unless we change the design more significantly) this could lead to a greater chance of missing events (e.g. event submission fails after transaction commits), but missing events seems much better than erroneous ones to me.

This could take one of a few flavors: A. keep it based on FHIRPersistenceInterceptor but add new afterX flavors that fire after the transaction commits B. make it work more like audit events which are now fired after transaction completion

Note: Paul mentioned that the current implementation is already using "commit async" to fire the events on NATS/Kafka, and so we already have the potential for failed submission of events.

Alternatively, we could move the event notification completely out of the request processing flow and fire them based on polling the history table/endpoints (or something like that).

Potentially related, but different, from https://github.com/IBM/FHIR/issues/1957

thomas-rudlof-bl commented 2 years ago

Another Problem with firing the notification before the transaction completed: Potential receivers of the notification might want to load the corresponding resource. But this might fail because the transaction has not yet completed. I see this every now and then (using a PostgreSQL DB)