Open rubberduck203 opened 3 years ago
This boils down to cycle detection in a directed graph. If a node points to itself, it is not a problem and we can leave it alone.
It's when a node depends on it's parent that things become problematic.
Or on some other ancestor.
If we can detect the cycle, and inline the definition, we can "collapse" cyclic nodes into a self-referencing node.
The goal being to create a not quite directed acyclic graph where each node only depends on its children or itself. (It's the "or itself" part that makes this not a true DAG.)
Relevant Research:
It should be noted that Johnson's algorithm is probably the most widely known and understood algorithm for cycle detection.
The FHIR JSON schema contains many circular references, notably
Element
andExtension
, which are fundamental building blocks for all FHIR resources but there are others, such asIdentifier
andReference
.If we transpile the FHIR JSON schema to AVRO as one giant
*.avsc
file, the circular references are correctly inlined, but the AVRO tooling chokes in several ways on the 110k line schema.avro-tools compile schema
generates Java source code that does not compile.If we transpile the FHIR JSON schema into many AVRO
*.avsc
files, we run into problems with the circular references. AVRO allows for circular references, but they must be inlined.The options as I see it now are:
avro-to-json-schema
detect circular dependencies and inline just those instead of everything.avro-to-json-schema
to generate a "first pass" avro schema, then take ownership of the files from there. Manually editing and curating them.I'm unsure of how possible option 1 is. Maybe there are some papers we could find on this kind of circular reference detection that would shed some light on how hard it would be to accomplish. If we could make that work, the dream of simply generating the avro schemas from the official HL7 schema might become a reality.
Option 2 is less desirable. When doing code generation, its usually best if you don't edit the generated files, so you can take advantage of changes in the source input and improvements in the generator tool itself. With that said, it's a better options than trying to hand write avro schemas for all 681 FHIR resources. At least we could start with a reasonably solid base very quickly.
It should be noted that we have discovered some places where the FHIR JSON schema does not match the FHIR spec. For example Questionnaire has a field called enableWhen.answer[x] which is a choice type. However, the JSON schema does not use a
oneOf
to define a union (schema snippet below). Taking ownership of the schemas would allow us to correct any such discrepancies until the official JSON schema can be corrected.https://www.hl7.org/fhir/questionnaire.schema.json.html