cerner / bunsen

Explore, transform, and analyze FHIR data with Apache Spark
https://engineering.cerner.com/bunsen
Apache License 2.0
114 stars 49 forks source link

DSTU2 Support #1

Open rbrush opened 6 years ago

rbrush commented 6 years ago

A lot of systems continue to use DSTU2, so we should add it as an option to Bunsen.

boristyukin commented 6 years ago

ideally to pass FHIR version number that will pick a proper HAPI model version

rbrush commented 6 years ago

Yeah, I think we can make it pretty easy to do that. There's a FhirVersionEnum from the HAPI API that would be a type-safe way for users to specify the version.

boristyukin commented 6 years ago

@rbrush have you thought about this more, Ryan? I am curious about use cases then someone needs to work with multiple vendors who are on different versions of FHIR. Another use case if one keeps historical bundles on one version and then need to upgrade to another version, without migrating archived bundles (which would be a real pain). These are common scenarios with analytics / DW projects.

One way is to build different versions of bunsen for a proper FHIR version. But I think ideally to pass an optional parameter to bunsen parser methods with FHIR version

rbrush commented 6 years ago

I think we can support this, with DSTU2, 3, and (soon) R4 datasets in the same process....similarly to how HAPI separates them. It will take some refactoring of the code but I'll dig a bit deeper over the next few days.

rbrush commented 6 years ago

I took a look at adding DSTU2 support, but the DSTU2 classes in the HAPI library we use don't always follow the same JavaBean conventions as DSTU3 and R4 do. This is important because Spark encoders offer support for JavaBean-style getters and setters, but classes without them don't have a solid way to encode without mucking around in Spark internals that are subject to change.

To see examples of this, notice that the CodeableConcept in STU2 doesn't offer JavaBean-style setters like setCoding that Spark would use to set the coding field [1], but the setCoding as expected exists in STU3 and R4. [2,3]

Because of this, we'll be able to support STU3 and R4 and subsequent releases in the same Bunsen build, but DSTU2 is a challenge. I think the best way to get around this is to convert resources to STU3 or R4 as part of the load into Spark. For instance, if you have an RDD of bundles or resources (which could be pulled from a Hive table or external source), it should be possible to use the conversion function described at [4] to convert it to a newer version, then use the Bunsen-provided encoders to encode that, optionally saving it for future use. This also has the advantage that all queries will be able to use the same FHIR data model

[1] http://hapifhir.io/apidocs-hl7org-dstu2/org/hl7/fhir/instance/model/CodeableConcept.html [2] http://hapifhir.io/apidocs-dstu3/org/hl7/fhir/dstu3/model/CodeableConcept.html [3] http://hapifhir.io/apidocs-r4/org/hl7/fhir/r4/model/CodeableConcept.html [4] http://hapifhir.io/doc_converter.html#Converting_from_DSTU2_to_DSTU3

boristyukin commented 6 years ago

I like your solution and I am pretty sure most companies will move from DSTU2 soon once R4 is out. So the concern is not DSTU2 itself, but how to deal with FHIR version changes in the future, keeping in mind that one can work with multiple versions at the same organization. Another use case is a conversion of historical data in one format to another.

On Sun, Apr 22, 2018 at 10:22 PM, Ryan Brush notifications@github.com wrote:

I took a look at adding DSTU2 support, but the DSTU2 classes in the HAPI library we use don't always follow the same JavaBean conventions as DSTU3 and R4 do. This is important because Spark encoders offer support for JavaBean-style getters and setters, but classes without them don't have a solid way to encode without mucking around in Spark internals that are subject to change.

To see examples of this, notice that the CodeableConcept in STU2 doesn't offer JavaBean-style setters like setCoding that Spark would use to set the coding field [1], but the setCoding as expected exists in STU3 and R4. [2,3]

Because of this, we'll be able to support STU3 and R4 and subsequent releases in the same Bunsen build, but DSTU2 is a challenge. I think the best way to get around this is to convert resources to STU3 or R4 as part of the load into Spark. For instance, if you have an RDD of bundles or resources (which could be pulled from a Hive table or external source), it should be possible to use the conversion function described at [4] to convert it to a newer version, then use the Bunsen-provided encoders to encode that, optionally saving it for future use. This also has the advantage that all queries will be able to use the same FHIR data model

[1] http://hapifhir.io/apidocs-hl7org-dstu2/org/hl7/fhir/instance/model/ CodeableConcept.html [2] http://hapifhir.io/apidocs-dstu3/org/hl7/fhir/dstu3/ model/CodeableConcept.html [3] http://hapifhir.io/apidocs-r4/org/hl7/fhir/r4/model/CodeableConcept.html [4] http://hapifhir.io/doc_converter.html#Converting_from_DSTU2_to_DSTU3

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cerner/bunsen/issues/1#issuecomment-383436372, or mute the thread https://github.com/notifications/unsubscribe-auth/AFp9NZcXxWbiwi08gKqFbW9vhP73SeHoks5trTrlgaJpZM4Q1wGE .

rbrush commented 6 years ago

Sounds good. The good news is that while working on this I managed to get most of the way to supporting R4, and am able to use both STU3 and R4 in a single Bunsen build in my local branch. I logged issue #21 to track the completion of that effort.

boristyukin commented 6 years ago

this is awesome, Ryan! you rock!!

On Mon, Apr 23, 2018 at 11:56 AM, Ryan Brush notifications@github.com wrote:

Sounds good. The good news is that while working on this I managed to get most of the way to supporting R4, and am able to use both STU3 and R4 in a single Bunsen build in my local branch. I logged issue #21 https://github.com/cerner/bunsen/issues/21 to track the completion of that effort.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cerner/bunsen/issues/1#issuecomment-383627083, or mute the thread https://github.com/notifications/unsubscribe-auth/AFp9NQqc5QjVdKc85yXSh-yV2CHeXciOks5trfnEgaJpZM4Q1wGE .

rbrush commented 6 years ago

Related to this: we've released Bunsen 0.3.0, which supports both STU3 and early R4 builds. As part of this we moved some of the APIs into bunsen.stu3 packages to future proof usage, so users will need to switch. Some details here:

http://engineering.cerner.com/bunsen/fhir_versions.html

At this point I don't expect that we'll be able to easily add direct use for DSTU2, but the link above has some suggestions for converting that content to STU3 or R4.