LinuxForHealth / FHIR

The LinuxForHealth FHIR® Server and related projects
https://linuxforhealth.github.io/FHIR
Apache License 2.0
330 stars 157 forks source link

Implement payload persistence offloading for minio / S3 #2899

Open punktilious opened 3 years ago

punktilious commented 3 years ago

Is your feature request related to a problem? Please describe. Relational database systems are not best suited to storing large objects. This can lead to increase runtime costs as well as scalability limitations.

Describe the solution you'd like Following the groundwork done in #1869, implement payload persistence offloading for minio / S3.

The solution must provide a mechanism for cleaning orphaned records which may remain after a failed transaction.

The solution should consider using asynchronous patterns for storing and fetching the payload data. This will be important when processing large bundles to maintain acceptable response times.

Describe alternatives you've considered Accept the limitations and cost of keeping the payload within the RDBMS.

Acceptance Criteria

  1. GIVEN a server configured with payload persistence for minio enabled AND a new resource is ingested WHEN the transaction is successful THEN the resource can be read back AND the payload can be found in the COS bucket

  2. GIVEN a server configured with payload persistence for minio enabled AND a new resource is ingested WHEN the transaction is rolled back THEN the payload can not found in the COS bucket

  3. GIVEN a server configured with payload persistence for minio enabled AND a new resource is ingested WHEN the transaction fails and rollback is not possible THEN the payload is removed from the COS bucket after the cleanup process is run

Additional context Design discussion to be had regarding how, where and when to run the transaction cleanup process.

The payload persistence mechanism is not tied to a Liberty Datasource so Liberty's transaction recovery mechanism doesn't apply. The discussion is really about whether the cleanup process should be a separate standalone job, or a background thread started by the FHIRServletContext.

Work for this feature should consider #2900 which implements the same offloading mechanism using Cassandra.

lmsurpre commented 3 years ago

from https://github.com/IBM/FHIR/issues/1869#issuecomment-944333369:

Be sure that the new config prop fhirServer/persistence/payload gets documented before closing this one

lmsurpre commented 2 years ago

this should get broken into subtasks. ideas for that:

  1. incorporate newer S3 client (move off COS which used v1 of the the AWS S3 client) - 5
    • write a new property adapter to configure minio/s3 connection info
    • normalize on this newer version of the S3 client, instead of the IBM COS client
      • would need to deprecate support for IBM Cloud IAM auth
  2. ...
  3. add offloading test that uses docker compose and tests offloading with minio - 10