AstunTechnology / iso19139.gemini23

Gemini 2.3 schema plugin for Geonetwork
0 stars 7 forks source link

Displaying add button for elements with gco:nilreason #35

Closed archaeogeek closed 4 years ago

archaeogeek commented 5 years ago

I've got a sample dataset with the following xml for parentidentifier:

<gmd:parentIdentifier xmlns:gco="http://www.isotc211.org/2005/gco" gco:nilReason="inapplicable"/>

This doesn't trigger the "add parent record" button. If the element is removed entirely, then the button appears. Is it possible to either show the nilreason or trigger the button in this scenario?

archaeogeek commented 5 years ago

@josegar74 this is also causing problems for other elements- where if there is just an element with a nilreason then it doesn't display in the interface until you go into xml view. It also breaks things like the inflate-metadata check to add a vertical CRS element if none exists. So the real question is how to improve the display of empty elements with a nilreason of inapplicable

josegar74 commented 5 years ago

@archaeogeek the problem is more related to the lack of child gco:CharacterString, this works:

<gmd:parentIdentifier 
    xmlns:gco="http://www.isotc211.org/2005/gco" 
    gco:nilReason="inapplicable">
    <gco:CharacterString></gco:CharacterString>
</gmd:parentIdentifier>

I'm thinking to use inflate-metadata.xsl to add the gco:CharacterString if the element has gco:nilReason="inapplicable" and in update-fixed-info.xsl remove it if it's empty.

How does it sound?

archaeogeek commented 5 years ago

@josegar74 that sounds like a good approach, as I found quite a lot of elements encoded like this

archaeogeek commented 5 years ago

@josegar74 I think I've managed to figure out what needs to go in inflate-metadata.xsl:

<xsl:template match="//*[@gco:nilReason]">
        <xsl:copy>
            <xsl:apply-templates select="@* | *"/>
            <xsl:element name="gco:CharacterString"/>
        </xsl:copy>
    </xsl:template>

But I'm not sure what to put in update-fixed-info.xsl. At present, with the above code in inflate-metadata.xsl, I get an error from https://github.com/geonetwork/core-geonetwork/blob/3.4.x/schemas/iso19139/src/main/plugin/iso19139/update-fixed-info.xsl#L270, which makes sense, but I'm not really sure how to fix the problem.

archaeogeek commented 5 years ago

@josegar74 see my attempted fix- it doesn't work unfortunately

josegar74 commented 5 years ago

I have updated a bit the templates:

 <xsl:template match="//*[@gco:nilReason]/gco:CharacterString" priority="10">
        <xsl:choose>
            <xsl:when test="not(text())">
                <xsl:message>Empty: <xsl:value-of select="name()" /></xsl:message>
            </xsl:when>
            <xsl:otherwise>
                <xsl:message>Not Empty: <xsl:value-of select="name()" /></xsl:message>
                <xsl:copy>
                    <xsl:apply-templates select="@*|node()"/>
                </xsl:copy>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
<!-- Add gco:Boolean to gmd:pass with nilReason to work nicely in the editor,
    update-fixed-info.xsl should removed if empty to avoid xsd errors  -->
  <xsl:template match="gmd:pass[@gco:nilReason and not(gco:Boolean)]" priority="20">
    <xsl:copy>
      <xsl:copy-of select="@*" />

      <gco:Boolean></gco:Boolean>
    </xsl:copy>
  </xsl:template>

    <!-- Add gco:CharacterString child nodes to elements with gco:nilReason attributes so they display in the editor, then use update-fixed-info.xsl to get rid of them if not required -->

    <xsl:template match="//*[@gco:nilReason and not(gco:CharacterString)]" priority="10">
        <xsl:copy>
            <xsl:apply-templates select="@*[name() != 'gco:nilReason']|*"/>
            <xsl:element name="gco:CharacterString"/>
        </xsl:copy>
    </xsl:template>

With a metadata containing:

<gmd:parentIdentifier xmlns:gco="http://www.isotc211.org/2005/gco" gco:nilReason="inapplicable"/>

The gco:CharacterString is added and the element appears in the editor, but if saving with empty value the element is saved as:

<gmd:parentIdentifier gco:nilReason="missing">
   <gco:CharacterString/>
</gmd:parentIdentifier>

This is due to this template in ISO19139 that sets gco:nilReason="missing" for elements with empty gco:CharacterString:

https://github.com/geonetwork/core-geonetwork/blob/5d1f32ca761e91b4768dc1396271c54908e98e24/schemas/iso19139/src/main/plugin/iso19139/update-fixed-info.xsl#L251-L287

I don't know the difference to use missing or inapplicable. Is GEMINI 2.3 using inapplicable and means to get rid of the child gco:CharacterString?

Something that can be a bit problematic is the current template in inflate:

    <!-- Add gco:CharacterString child nodes to elements with gco:nilReason attributes so they display in the editor, then use update-fixed-info.xsl to get rid of them if not required -->

    <xsl:template match="//*[@gco:nilReason and not(gco:CharacterString)]" priority="10">

That code is adding gco:CharacterStringto any element with gco:nilReason, but that attribute can exists in elements containing for example gco:Real: http://www.datypic.com/sc/niem21/e-gmd_minimumValue-1.html


archaeogeek commented 5 years ago

@josegar74 It's not clear which version Gemini wants us to use- see https://www.agi.org.uk/agi-groups/standards-committee/uk-gemini/40-gemini/1062-gemini-datasets-and-data-series#21 for example where they say you should use a nilreason of unknown or inapplicable. This is actually the only mention of "nilreason" or "nil reason" in the whole of the guidance for datasets- but the example records I have been given use a nilreason of inapplicable for other elements too.

So... for now, could we update the inflate templates to match on inapplicable or unknown, and only for gco:CharacterString?

I will seek guidance from the Gemini working group on usage of nilReason for other elements and we can resolve it later if a problem occurs

josegar74 commented 4 years ago

@archaeogeek I have committed the changes to inflate this element.

But I notice in 3.8.x 2 issues:

https://github.com/AstunTechnology/iso19139.gemini23/blob/8b30ae11e57d2607d7e4283c49c10df6c78729b4/src/main/plugin/iso19139.gemini23/layout/config-editor.xml#L3630-L3632

See in https://www.isotc211.org/2005/gml/basicTypes.xsd

In the section 2.2.2 Root element in https://www.agi.org.uk/40-gemini/1048-uk-gemini-encoding-guidance:

xsi:schemaLocation="http://www.isotc211.org/2005/gmx
        http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/gmx/gmx.xsd
        http://www.isotc211.org/2005/srv
        http://inspire.ec.europa.eu/draft-schemas/inspire-md-schemas/srv/1.0/srv.xsd">

That includes the xsd from http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas and use gml 3.2 url for the namespace.

We need to check about this and check if require to update the local xsd in the schema.

josegar74 commented 4 years ago

About the Add button, it's related to https://github.com/geonetwork/core-geonetwork/issues/4192

I just fixed in the GEMINI 2.3 3.8.x branch.

Please check from previous comment about the schema references.

archaeogeek commented 4 years ago

Thanks @josegar74 the basic issue seems to be fixed. What would be the changes needed to fix the problem with schema references?

josegar74 commented 4 years ago

@archaeogeek in 3.8 version are supported 2005 and 2007 versions of iso19139, I can create a branch with the changes, mainly should be related to the schemaLocation to use 2007 version that supports gml 3.2.

For early versions, maybe an option is to update also the schemaLocation and disable in the settings the option to use the local xsd in the validation. That way will be used the remote xsd, that is slower to validate, but also makes sure to use the remote / updated versions.

josegar74 commented 4 years ago

@archaeogeek I have pushed the changes directly to 3.8.x as were quite limited.

Try to create new metadata, if using existing database will require to remove current templates and reload them. The new metadata should not have elements with gml namespace inline.

See https://github.com/AstunTechnology/iso19139.gemini23/commit/c47df29302e70d4e03ae9e04d64889ea65ae7c3a

archaeogeek commented 4 years ago

@josegar74 I've reloaded the templates and the sample data, I think it's working as expected- I'm now looking at some validation errors but I'll add those as separate issues and close this one thanks

archaeogeek commented 4 years ago

Reopening as the fix for this issue does not seem to be working correctly, and is the root of the problems described in https://github.com/AstunTechnology/iso19139.gemini23/issues/42. The problem we have is that mentioned in https://github.com/AstunTechnology/iso19139.gemini23/issues/35#issuecomment-539909785 that the ISO19139 template is adding nilreason=missing to the elements where an empty characterstring has been added. So:

<gmd:hierarchyLevelName gco:nilReason="inapplicable"/>

becomes:

<gmd:hierarchyLevelName xmlns:gml="http://www.opengis.net/gml/3.2" gco:nilReason="missing">
      <gco:CharacterString/>
   </gmd:hierarchyLevelName>

but then our code at https://github.com/AstunTechnology/iso19139.gemini23/blob/3.8.x/src/main/plugin/iso19139.gemini23/update-fixed-info.xsl#L98 is not being triggered and validation errors occur because of the empty strings.

I think we need to override the behaviour of the template in ISO19139 which does this, and preserve the correct nilreason but I can't see how to do it?

archaeogeek commented 4 years ago

@josegar74 I've experimented with adding the code from ISO19139 update-fixed-info into the one for Gemini23 (though haven't committed it) but the code that strips the empty characterstring (https://github.com/AstunTechnology/iso19139.gemini23/blob/3.8.x/src/main/plugin/iso19139.gemini23/update-fixed-info.xsl#L98) doesn't seem to be working. It does work in isolation in Oxygen XML though, so I don't know if there's something else going on as well?

josegar74 commented 4 years ago

I've manage to get it working when the element is empty, with these templates:

inflate-metadata.xsl

  <xsl:template match="//*[(@gco:nilReason='inapplicable' or @gco:nilReason='unknown') and not(gco:CharacterString) and name() != 'gmd:verticalElement']" priority="10">
      <xsl:copy>
        <xsl:apply-templates select="@*|*"/>
        <xsl:element name="gco:CharacterString"/>
      </xsl:copy>
    </xsl:template>

update-fixed-info.xsl

 <xsl:template match="//*[(@gco:nilReason='inapplicable' or @gco:nilReason='unknown') and gco:CharacterString]" priority="10">
    <xsl:choose>
      <xsl:when test="not(gco:CharacterString/text())">
          <!-- Copy element and attributes, not children -->    
          <xsl:copy>
            <xsl:apply-templates select="@*"/>
          </xsl:copy>
        </xsl:message>
        <xsl:copy>
          <xsl:apply-templates select="@*"/>
        </xsl:copy>
      </xsl:when>
      <xsl:otherwise>
        <xsl:copy>
          <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

So if the metadata has <gmd:hierarchyLevelName gco:nilReason="inapplicable"/> the editor gets the following:

   <gmd:hierarchyLevelName gco:nilReason="inapplicable">
      <gco:CharacterString></gco:CharacterString>
   </gmd:hierarchyLevelName>

So the field can be edited and when saved, if empty, it's stored properly as <gmd:hierarchyLevelName gco:nilReason="inapplicable"/>.

The issue is when filling the value, for now the attribute is kept:

   <gmd:hierarchyLevelName gco:nilReason="inapplicable">
      <gco:CharacterString>value</gco:CharacterString>
   </gmd:hierarchyLevelName>

That allows to handle it properly if it's emptied again, but probably not correct. If we remove it:

   <gmd:hierarchyLevelName>
      <gco:CharacterString>value</gco:CharacterString>
   </gmd:hierarchyLevelName>

If it's emptied again, GeoNetwork converts it:

   <gmd:hierarchyLevelName gco:nilReason="missing">
      <gco:CharacterString></gco:CharacterString>
   </gmd:hierarchyLevelName>

I think we need to define a list of elements that should be handle in special way (to add "inapplicable" or "unknown" instead of "missing" value) when are emptied.

archaeogeek commented 4 years ago

@josegar74 it's going to be very hard to list a set of elements and specific nilreasons but here goes:

gmd:version (in the context of gmd:MD_Format) should allow either nilReason=inapplicable or nilReason=unknown and it's important to keep the correct one

gmd:explanation (in the context of Conformity gmd:DQ_ConformanceResult) nilReason=unknown gmd:pass nilReason=unknown See https://www.agi.org.uk/agi-groups/standards-committee/uk-gemini/40-gemini/1062-gemini-datasets-and-data-series#41

BUT there might be other nilreasons in other types of DQ_ConformanceResults for data quality- see https://www.agi.org.uk/agi-groups/standards-committee/uk-gemini/40-gemini/1062-gemini-datasets-and-data-series#52 and I honestly don't know how we'd cover all of those possibilities.

In the sample records at https://github.com/AGIGemini/Schematron/tree/master/samples there are a number of other examples, some of which are actually wrong (temporal extent) and some not in the standard (couplingtype and containsoperations for services) so I'm not sure what to do about those at all.

Would it be possible for some sort of validation check that looks for nilreasons present when the value is filled in? For example in your steps above:

<gmd:hierarchyLevelName gco:nilReason="inapplicable">
      <gco:CharacterString>value</gco:CharacterString>
   </gmd:hierarchyLevelName>

If the record was validated at this point could we have a rule that would fail in that case?

I think it's fine if the value is removed at that point so that GeoNetwork automatically fills in the gco:nilReason="missing" component:

<gmd:hierarchyLevelName gco:nilReason="missing">
      <gco:CharacterString></gco:CharacterString>
   </gmd:hierarchyLevelName>

This will fail validation and then people should be able to either fill the value in, delete the element, or add the correct nilReason instead.

There are two points that we need to get to:

1) Someone imports a record that they believe is already Gemini 2.3 compliant into GeoNetwork and does no editing in it- they just use it for the CSW endpoint. In this case the record should not be changed at all by the import process, even if validation is run as part of that process.

2) Someone imports a record into GeoNetwork and then validates it in the editing interface. In this case, they need to be able to add values to existing elements that have a nilReason applied, but if they choose not to, then the end result of editing/validating should not change the nilReason. If however they delete values, or add new ones, then the default behaviour of adding a nilReason=missing should apply and they can change that if they want.

Does that all make sense?

archaeogeek commented 4 years ago

@josegar74 I think this is the priority issue to fix now, as I can't fully test the conversion scripts from Gemini 2.2 to 2.3 due to the validation errors it causes with empty strings. Could you let me know when you'll be able to look at it?

josegar74 commented 4 years ago

@archaeogeek I'll focus on this issue tomorrow.

josegar74 commented 4 years ago

@archaeogeek a question about the following case:

gmd:version (in the context of gmd:MD_Format) should allow either nilReason=inapplicable or nilReason=unknown and it's important to keep the correct one.

If we have the following in the metadata:

<gmd:version gco:nilReason="inapplicable" />

It's inflated when edited:

<gmd:version gco:nilReason="inapplicable">
   <gco:CharacterString></gco:CharacterString>
</gmd:version>

If the user doesn't fill it, it should be stored again as <gmd:version gco:nilReason="inapplicable" />, that's clear, but suppose it gets filled, then gco:nilReason will be removed:

<gmd:version>
   <gco:CharacterString>1.0</gco:CharacterString>
</gmd:version>

If the metadata is edited again and emptied, we should assign nilReason=inapplicable or nilReason=unknown (we don't have anymore the original attribute value)


About gmd:pass nilReason=unknown, this is already handled for any element containing gco:Boolean in https://github.com/geonetwork/core-geonetwork/pull/3569


Related to this:

Would it be possible for some sort of validation check that looks for nilreasons present when the value is filled in? For example in your steps above:

<gmd:hierarchyLevelName gco:nilReason="inapplicable">
  <gco:CharacterString>value</gco:CharacterString>
</gmd:hierarchyLevelName>

If the record was validated at this point could we have a rule that would fail in that case?

You mean not to remove the gco:nilReason in these cases, so the user has to remove them manually?

The kept of the attribute is handle already in https://github.com/AstunTechnology/iso19139.gemini23/blob/799ceb71f6b187381495de47e72d971a7aecb4c2/src/main/plugin/iso19139.gemini23/update-fixed-info.xsl#L106

I was doing a change to remove it when the field is filled, but if prefer the user removes it manually it's another option, let me know:

<xsl:otherwise>
  <xsl:message>Not Empty: <xsl:value-of select="name(..)" /></xsl:message>
  <xsl:copy>
    <xsl:apply-templates select="@*[name() != 'gco:nilReason']|node()"/>
  </xsl:copy>
</xsl:otherwise>
josegar74 commented 4 years ago

Added a validation rule to check for elements with value and gco:nilReason, so when the user fills the element, the attribute is kept (see the end of the previous comment).

The user can enable the More details option in the editor and remove it. See attached screencast, if looks fine I'll commit the validation rule.

The other option is to remove the attribute when filled the element, as indicated also at the end of the previous comment.

nilReason.webm.zip

archaeogeek commented 4 years ago

@josegar74 I think the approach taking in the screencast is fine. For gmd:version, if it is filled in and then emptied again, so we lose the original nilReason, I think that is fine. I will add a note to the html guidance in the editor to say that people need to add gco:nilReason=unknown manually.

josegar74 commented 4 years ago

Note that if filled any element with gco:nilReason='inapplicable' or @gco:nilReason='unknown' are not removed due to:

https://github.com/AstunTechnology/iso19139.gemini23/blob/799ceb71f6b187381495de47e72d971a7aecb4c2/src/main/plugin/iso19139.gemini23/update-fixed-info.xsl#L98-L110

The user can enable manually the More details option and remove the attribute there.

Going to commit the updated validation rule then.

archaeogeek commented 4 years ago

@josegar74 there still seems to be a problem with the gmd:explanation element unfortunately. When harvesting from WMS this is set as <explanation gco:nilReason="inapplicable"/> from https://github.com/AstunTechnology/iso19139.gemini23/blob/3.8.x/src/main/plugin/iso19139.gemini23/convert/OGCWxSGetCapabilitiesto19119/OGCWxSGetCapabilitiesLayer-to-19139.xsl#L411 but when the record is edited this is converted to:

<gmd:explanation gco:nilReason="unknown">
       <gco:CharacterString/>
 </gmd:explanation>

I'm not sure from the guidance whether unknown or inapplicable is the best case, but it should not be expanded and it fails validation. I thought that https://github.com/metadata101/iso19139.gemini23/blob/3.8.x/src/main/plugin/iso19139.gemini23/update-fixed-info.xsl#L91 should stop this from happening or have I misunderstood what's going on?

archaeogeek commented 4 years ago

@josegar74 Reading through the comments on this thread I can see that we are deliberately expanding elements with gco:nilReason="inapplicable" or "unknown" here https://github.com/AstunTechnology/iso19139.gemini23/blob/3.8.x/src/main/plugin/iso19139.gemini23/inflate-metadata.xsl#L405-L412

but https://github.com/AstunTechnology/iso19139.gemini23/blob/3.8.x/src/main/plugin/iso19139.gemini23/inflate-metadata.xsl#L405-L412 says it should only be set to gco:nilReason="unknown" if it is empty, but it's not?

Then I thought empty gco:characterString elements should be stripped by https://github.com/AstunTechnology/iso19139.gemini23/blob/3.8.x/src/main/plugin/iso19139.gemini23/update-fixed-info.xsl#L114-L126 on validation or saving but that doesn't seem to be happening?

Have I misunderstood something?

josegar74 commented 4 years ago

The 2on link I think is wrong (it's the same as 1st). To be able to edit the elements in GeoNetwork:

Is not working like that? I'm going to re-check, but from yesterday I didn't notice any issue.

josegar74 commented 4 years ago

I have done the following:

1) Create a new metadata with the Dataset template.

2) Leave empty gmd:explanation from the Conformity report

3) Save&Close the metadata

4) Open the xml: <gmd:explanation gco:nilReason="unknown"/>

5) Edit again and set a value, save&close. Note that currently gco:nilReason="unknown" is preserved (the user should remove it manually in the editor if wants to get rid of it, see a previous comment related to this)

<gmd:explanation gco:nilReason="unknown">
  <gco:CharacterString>explanation</gco:CharacterString>
</gmd:explanation>

Let me know the tests you're doing, to check if the issue is related to other element(s).

archaeogeek commented 4 years ago

@josegar74 if I replicate your steps, then it's fine. The issue I am having is with harvested records.

  1. Set up a WxS harvester for http://inspire.worcester.gov.uk/geoserver/worcester/wms?service=wms (WMS 1.3.0, import record for each layer using MetadataURL attributes, output schema Gemini 2.3). You should get 20 records.
  2. Without editing the record, view/download the XML for Worcester - SWDP Retail Allocations. The explanation element is set as <gmd:explanation gco:nilReason="inapplicable"/>
  3. Go into editing mode and switch straight to xml view, the element has now been changed to:
    <gmd:explanation gco:nilReason="unknown">
    <gco:CharacterString/>
    </gmd:explanation>
  4. Validate the record- it fails because of the empty character string.
  5. Save and close the record, download it or otherwise view the xml again, and the explanation element has now been set to <gmd:explanation gco:nilReason="unknown"/>

Also, the pass element that goes with this starts off as <gmd:pass gco:nilReason="unknown"/> and becomes:

 <gmd:pass>
    <gco:Boolean>false</gco:Boolean>
 </gmd:pass>

So even though we're not trying to make any changes to the record in this process, it is being changed simply by activating editing and validation. Should we be looking at overriding the behaviour in ISO19139 that is causing this in the first place?

josegar74 commented 4 years ago

@archaeogeek I'll check the inflate process, seem not preserving the attribute

archaeogeek commented 4 years ago

@josegar74 thank you, I've tested your latest commit and am getting much better results now. I will need to test with some other records as well, to be sure, but it looks good so far.

archaeogeek commented 4 years ago

I've seen no further issues with this so will close it, thanks