HL7 / kindling

FHIR Publisher
Other
4 stars 11 forks source link

Switch remaining R5 Turtle RDF generation to TurtleParser implementation #148

Open tmprd opened 1 year ago

tmprd commented 1 year ago

Fixes these issues and also missing type annotations on profiles (ex. https://www.hl7.org/fhir/actordefinition.profile.ttl.html)

This simply switches some of the example generators to the R5+ TurtleParser instead of the RdfParser used before R5.

dbooth-boston commented 8 months ago

@tmprd could you please update this to the latest base branch, and then let's see if we can get @grahamegrieve 's attention to merge it into main.

markiantorno commented 8 months ago

If you update to the latest so we can merge without conflicts, I will assist in getting this through the pipeline. @balhoff

@dotasek fyi

tmprd commented 7 months ago

It looks like this is no longer producing this file: baseFileName + ".ttl"

It does look like the parser has been switched, but was it intended to still output that file?

Thanks for your review @dotasek Expanding the changes shows where this file is produced now: https://github.com/HL7/kindling/pull/148/files#diff-0b45ecac1a10eb5075600a7cb0da87b7531c1c2795f7d2519db5a50e1d440a42R4495-R4506

I double checked the turtle files that were generated from this, and it only outputs files for R5. Is that expected? That is what I intended. It's not clear to me how different versions of FHIR are published as part of this process, and whether its possible to update older versions from this code.

dbooth-boston commented 7 months ago

I think @ericprud figured out how versions are handled. Eric, can you answer this?

ericprud commented 7 months ago

I think @ericprud figured out how versions are handled. Eric, can you answer this?

If the Q is how do I handle version selection in APIs, I've been making an API call distinction between RDVch (coincident with DSTU3 and R4) and rdvCh (coincident with R5). My switches have been separate but I think the only one that it might make sense to keep separate is hoist (as an override to the 3,4-style or 5-style).

dbooth-boston commented 6 months ago

@dotasek , unless the R5 build process is also expected to produce R4 examples, then I believe from @tmprd 's response (above) that the R5 turtle examples you mentioned are being generated.

tmprd commented 6 months ago

Will Publisher.java be used to republish R4 or below? This code suggests Publisher.java is used only for R4B, R5, and R6 (so far). https://github.com/HL7/kindling/blob/78f6a1656b4987dbdcf32b0226183143a6cf0cc8/src/main/java/org/hl7/fhir/tools/publisher/Publisher.java#L3684-L3692 If Publisher.java is to be used for R4 or below, then I can add a condition to ensure that this RDF generator is for R5 and above. If not, you can merge this. Currently, this RDF generator is used for any version other than R4B, which is logic used by serializeResource(): https://github.com/HL7/kindling/blob/78f6a1656b4987dbdcf32b0226183143a6cf0cc8/src/main/java/org/hl7/fhir/tools/publisher/Publisher.java#L2086-L2087 So we are currently doing no worse than anything that uses serializeResource() or resource2Json(), but I just thought of this as an extra precaution.

dotasek commented 6 months ago

@tmprd kindling is normally only used to generate the version of FHIR currently in development (at this juncture, R6).

There was a special case of this with R5 and R4B being developed in parallel, but this was an exception.

@grahamegrieve is this correct, or am I missing some details regarding publishing existing versions of the spec?

dotasek commented 6 months ago

@tmprd

This does use the generator, but the code you're pointing to doesn't create the necessary file. To replicate this, run kindling to publish FHIR. Look in the publish folder for this document:

publish/definition.ttl.html

Open the document in a web browser. There is a link that says Raw Turtle. It is meant to link to definition.ttl, which is missing. This doesn't happen with the master branch; definition.ttl is generated.

I put a breakpoint in the master branch of kindling to verify, and this happens for 1460 files. All that needs to be done is for that returned string to be written to the appropriate file. The original code has a FileOutputStream already created in that place. You would just have to use PrintWriter or something to actually write out to it.

tmprd commented 6 months ago

@tmprd

This does use the generator, but the code you're pointing to doesn't create the necessary file. To replicate this, run kindling to publish FHIR. Look in the publish folder for this document:

publish/definition.ttl.html

Open the document in a web browser. There is a link that says Raw Turtle. It is meant to link to definition.ttl, which is missing. This doesn't happen with the master branch; definition.ttl is generated.

I put a breakpoint in the master branch of kindling to verify, and this happens for 1460 files. All that needs to be done is for that returned string to be written to the appropriate file. The original code has a FileOutputStream already created in that place. You would just have to use PrintWriter or something to actually write out to it.

Thank you! Actually I noticed it has to be written using compose() method, so I've made changes to fix it using that. This matches what is used elsewhere for resource examples: https://github.com/HL7/kindling/blob/7b19236173ca67c6a1d51074741c889ec155955b/src/main/java/org/hl7/fhir/tools/publisher/Publisher.java#L5384-L5385 This uses org.hl7.fhir.r5.elementmodel.TurtleParser instead of org.hl7.fhir.r4.formats.RdfParser.java.

dbooth-boston commented 4 months ago

Per the above request, I'm looking at the diffs between the old and new, to verify that they all look correct. To help, I've uploaded a diff of only the old and new fhir.ttl ontology file here: ont-diffs.txt

I will post again here after I've finished my comparison.

ericprud commented 3 weeks ago

Fixes:

  1. missing fhir:text/fhir:div
  2. missing fhir:nodeRole [fhir:treeRoot]
  3. missing ^^xsd:anyURI on: a. */fhir:url/fhir:v b. fhir:system c. fhir:constraint/fhir:source
  4. missing fhir:link on fhir:baseDefinition, fhir:source
  5. remove duplicates of list elements in e.g. fhir:contact, fhir:status, fhir:date, fhir:jurisdiction, fhir:publisher...
  6. missing ^^xsd:dateTime on fhir:date
  7. missing ^^xsd:boolean on fhir:experimental, fhir:abstract, fhir:isModifier, fhir:isSummary
  8. missing ^^xsd:nonNegativeInteger on */fhir:element/fhir:min (/'max')

Issues:

tmprd commented 3 weeks ago

Fixes:

  1. missing fhir:text/fhir:div
  2. missing fhir:nodeRole [fhir:treeRoot]
  3. missing ^^xsd:anyURI on: a. */fhir:url/fhir:v b. fhir:system c. fhir:constraint/fhir:source
  4. missing fhir:link on fhir:baseDefinition, fhir:source
  5. remove duplicates of list elements in e.g. fhir:contact, fhir:status, fhir:date, fhir:jurisdiction, fhir:publisher...
  6. missing ^^xsd:dateTime on fhir:date
  7. missing ^^xsd:boolean on fhir:experimental, fhir:abstract, fhir:isModifier, fhir:isSummary
  8. missing ^^xsd:nonNegativeInteger on */fhir:element/fhir:min (/'max')

Issues:

  • aesthetic:

    1. adds spurious comment character after top-level predicate-object pairs (i.e. those with the resource as the subject).
  • semantic:

    1. addes xsd:anyURI datatype to some fhir:codes/fhir:v objects, but not xsd:discriminator
    2. addes xsd:anyURI datatype to fhir:extension/fhir:value/fhir:id objects,e.g. fhir:type
    3. still missing ^^xsd:nonNegativeInteger on */fhir:element/fhir:max (when it's not '*' (issue?))

Thanks for checking this Eric! Just to clarify for everyone, this pull request simply switches the Turtle parser and serializer in Kindling from the old implementation to the new TurtleParser within the org.hl7.fhir.core project. I believe all remaining issues need to be fixed there.

For example, the comment characters seem to be added in the org.hl7.fhir.utilities.turtle.Turtle.commitSection() method. I added an issue for this: https://github.com/w3c/hcls-fhir-rdf/issues/150

dbooth-boston commented 3 weeks ago

@tmprd , thank you so much for your work on this, and sorry it took so long to get the results better vetted. (Thanks @ericprud for that!) I know I said I would give up on further vetting them if I had not gotten to it earlier, but I just couldn't bring myself to do that. I kept thinking that I'd find a way. And thankfully, @tmprd came to the rescue with a better vetting approach. Thanks all!