LinuxForHealth / FHIR

The LinuxForHealth FHIR® Server and related projects
https://linuxforhealth.github.io/FHIR
Apache License 2.0
328 stars 157 forks source link

enhancement for appending a URL to the Bulk FILE Export #2874

Open dabimahesh opened 2 years ago

dabimahesh commented 2 years ago

BDE-09: Bulk Data Server SHALL restrict bulk data file access with access token

Scenario 1 - When token is provided, we get response after validating the token Request: GET 200 https://fhir.secureit.co.in:9443/fhir-server/api/v4/$bulkdata-status?job=O5sfGj3l4ZFOO_G0TdwHvQ

OAuth Token Provided: YES

Response: { "transactionTime": "2021-10-08T06:58:34.668Z", "request": "https://fhir.secureit.co.in:9443/fhir-server/api/v4/Group/17c5ea3c53d-9ae67467-60d2-4f1d-85f6-6808d680589e/$export", "requiresAccessToken": true, "output": [ { "type": "AllergyIntolerance", "url": "https://1secureit.s3.amazonaws.com/fhir/1secureit/iLUQQ4xsv0xbYoQ-auymO567ndIPa5EsRjR40AtZEz8/AllergyIntolerance_1.ndjson", "count": 1 }, { "type": "Group", "url": "https://1secureit.s3.amazonaws.com/fhir/1secureit/iLUQQ4xsv0xbYoQ-auymO567ndIPa5EsRjR40AtZEz8/Group_1.ndjson", "count": 1 }, { "type": "Patient", "url": "https://1secureit.s3.amazonaws.com/fhir/1secureit/iLUQQ4xsv0xbYoQ-auymO567ndIPa5EsRjR40AtZEz8/Patient_1.ndjson", "count": 1 } ], "error": [] }

Scenario 2 - From the above response, Inferno then picks up individual URLs and does a GET as shown below. In one such scenario, it sends no token and it expects 400 or 401, whereas it receives a 200.

Since we are using AWS bucket, it gives us a direct URL and there is no way it will validate the OAuth token.

Before using AWS, we tried using file as storage provider, but it provided partial path ("url" : "iLUQQ4xsv0xbYoQ-auymO567ndIPa5EsRjR40AtZEz8/AllergyIntolerance_1.ndjson") and not the complete URL. If it could provide a complete HTTP/HTTPS path, then this may solve the problem.

Could you please help us with this? Also, is there a way where S3 URL could be routed through our oAUTH server?

Request: GET 200 https://1secureit.s3.amazonaws.com/fhir/1secureit/iLUQQ4xsv0xbYoQ-auymO567ndIPa5EsRjR40AtZEz8/AllergyIntolerance_1.ndjson

OAuth Token Provided: NO

Response: { "resourceType": "AllergyIntolerance", "id": "17c5ea10448-672d34c4-5131-4d90-a488-fae9151fb8f8", "meta": { "versionId": "1", "lastUpdated": "2021-10-08T06:39:43.178Z", "profile": [ "http://hl7.org/fhir/us/core/StructureDefinition/us-core-allergyintolerance" ] }, "clinicalStatus": { "coding": [ { "system": "http://terminology.hl7.org/CodeSystem/allergyintolerance-clinical", "code": "inactive" } ] }, "code": { "coding": [ { "system": "http://snomed.info/sct", "code": "300916003", "display": "Latex allergy" } ], "text": "Latex allergy" }, "patient": { "reference": "Patient/17c5e5cabb3-19b86ee6-35be-4157-b4a0-fb9b6ad28341" } }

You can refer to discussion on https://chat.fhir.org/#narrow/stream/212434-ibm/topic/Bulk.20Export

prb112 commented 2 years ago
Could you please help us with this?
Also, is there a way where S3 URL could be routed through our oAUTH server?

This is beyond the scope of the enhancement request. We would prefer not to implement the requested routing through your oauth server.

dabimahesh commented 2 years ago

One of the solutions we discussed was around using FILE as a storage provider. Below is an example response after the bulk export is completed: { "transactionTime": "2021-09-21T10:53:53.879Z", "request": "https://fhir.secureit.co.in:9443/fhir-server/api/v4/Group/17b5e38b010-74858f83-7570-45cb-979b-8d00b694a293/$export", "requiresAccessToken": true, "output": [{ "type": "Provenance", "url": "KTwhDhHknT5k7lLtqB8eUQM2e0LuvIgX7NQSLqbH5_A/Provenance_1.ndjson", "count": 1 }], "error": [] }

If you see the URL here is the reference URL and not complete. The exported files are located under fhirserver on liberty. If you could provide a complete HTTP/HTTPS URL instead of a reference URL, then this URL will inherit the authentication system configured for the FHIR server which means if the SMART is enabled it will validate the supplied token and proceed accordingly.

If you think this could be a probable solution then this could involve fewer changes. The default behavior could be as is, and having a complete URL in the response could be based on a configuration in fhirserverconfig. Again you are the correct person to judge on how much effort and changes would be involved. This URL can be exposed by doing some configuration in Liberty by publishing the folder. I donot know much about how to expose one additional folder in FHIR where these files will reside.

If we are able to achieve this then we may not need to disturb the current behavior in the S3 storage provider.

As you mentioned, "when requiresAccessToken==true, the security model for the exported files should be the same as the one for requesting the export (using SMART Backend Services)". Could this be one of the approaches to get there?

prb112 commented 2 years ago

Hi @dabimahesh

The only change we are able to accommodate is adding a configuration name-value pair per storageProvider configuration, such that a URL will be prepended to the output.url. Example KTwhDhHknT5k7lLtqB8eUQM2e0LuvIgX7NQSLqbH5_A/Provenance_1.ndjson to https://download.url.co/KTwhDhHknT5k7lLtqB8eUQM2e0LuvIgX7NQSLqbH5_A/Provenance_1.ndjson

Servicing the request is beyond the scope and not supported directly through the IBM FHIR Server.

If this does not meet your needs, we'll move this to the appropriate backlog or icebox.

Thank you

Paul

dabimahesh commented 2 years ago

Thanks Paul. So the URL will be for the file folder exposed under FHIR server only right? e.g. https://fhir.secureit.co.in:9443/fhir-server/api/v4/KTwhDhHknT5k7lLtqB8eUQM2e0LuvIgX7NQSLqbH5_A/Provenance_1.ndjson

prb112 commented 2 years ago

Let me clarify.

It'd be KTwhDhHknT5k7lLtqB8eUQM2e0LuvIgX7NQSLqbH5_A/Provenance_1.ndjson to https://download.url.co/KTwhDhHknT5k7lLtqB8eUQM2e0LuvIgX7NQSLqbH5_A/Provenance_1.ndjson

It could be any path on the server or on another server. The retrieval and URL are up to you. the fhir-server web app is not intended to serve the exported content directly.

dabimahesh commented 2 years ago

Ok Will the URL which is returned have proper data? How data will be exported to this URL?

https://download.url.co/KTwhDhHknT5k7lLtqB8eUQM2e0LuvIgX7NQSLqbH5_A/Provenance_1.ndjson

Thanks Mahesh

prb112 commented 2 years ago

You would be responsible for providing access and downloads at that location. It is not the responsibility of the IBM FHIR Server.

As you comment suggests the configuration would not address your needs, therefore I am moving this to the icebox and taking it out of the sprint and release.

Thank you, Paul