GSA / ckanext-geodatagov

data.gov extension
Other
36 stars 39 forks source link

Issues with DateStamp transformation #104

Open torrin47 opened 7 years ago

torrin47 commented 7 years ago

We're seeing records where the datestamp isn't being accurately transformed, and are a little confused about why. Here's an example, where data.gov harvested a CSDGM record: https://catalog.data.gov/harvest/object/9c6483ec-83d4-437c-bc16-fbfe40d5b896/original with a //metd value of 2016-11-02, but the corresponding transformed ISO document: https://catalog.data.gov/harvest/object/9c6483ec-83d4-437c-bc16-fbfe40d5b896 reports a //dateStamp value of 9999-01-01.

The XSLT doc looks ok, so what's causing this problem? https://github.com/GSA/ckanext-geodatagov/blob/master/conversiontool/fgdc2iso/fgdcrse2iso19115-2.xslt

      <gmd:dateStamp>
        <xsl:choose>
          <xsl:when test="(fn:contains(fn:lower-case(fn:normalize-space(fn:string(//metd))), 'unknown'))">
            <xsl:attribute name="gco:nilReason">
              <xsl:sequence select="xs:string('unknown')"/>
            </xsl:attribute>
          </xsl:when>
          <xsl:otherwise>
            <gco:Date>
              <xsl:call-template name="fgdc2isoDate">
                <xsl:with-param name="dateField" select="normalize-space(//metd)"/>
              </xsl:call-template>
            </gco:Date>
          </xsl:otherwise>
        </xsl:choose>
      </gmd:dateStamp>
kvuppala commented 7 years ago

@torrin47 @FuhuXia We are looking into the issue, the sample catalog URLs provided are not working now.

Can you provide the metadata record example link to troubleshoot the issue.

torrin47 commented 7 years ago

I believe this is the equivalent example record in the new system:

https://catalog.data.gov/dataset/national-emissions-inventory-u-s-2014-epa-oar-oaqps-aqad

charness39 commented 7 years ago

Here are a few more examples:

https://admin-catalog.data.gov/dataset/jobs-within-a-30-minute-transit-ride-service

https://catalog.data.gov/dataset/walkability-index

https://catalog.data.gov/dataset/smart-location-database-service

The records above also do not appear to have the links to the ISO or Original FGDC metadata.

amilan17 commented 7 years ago

I believe that the fgdc2isoDate template is expecting a valid CSDGM date format, which in the example above would be 20161102 (without the dashes). It doesn't know how to handle ISO valid dates in the source CSDGM document...

kvuppala commented 7 years ago

@amilan17 thank you, that seem to be the case.

@charness39 @torrin47 I see that publication date is parsed correctly on the example https://catalog.data.gov/dataset/national-emissions-inventory-u-s-2014-epa-oar-oaqps-aqad , would you be able to update the metadata date and try reharvesting?

We will look into the other two records where the ISO / FGDC metadata links are not getting displayed.

torrin47 commented 7 years ago

Sure, we actually have a script that updates all of those dates to reflect the date the record was last edited or updated from our catalog’s perspective, because we found if stewards made edits without updating that element, Data.gov wouldn’t recognize the edits. And we were under the impression that CKAN required ISO dates. But really quick and easy for us to adjust that globally for all our CSDGM docs.

We’ve done so and reharvested, but the records don’t appear to be updated at this time – should we expect a bit of a lag? Harvest Report: https://admin-catalog.data.gov/harvest/environmental-dataset-gateway-fgdc-csdgm/job/340586a2-9c92-4025-b47e-12660ed3f2bd Data.gov record: https://catalog.data.gov/dataset/national-emissions-inventory-u-s-2014-epa-oar-oaqps-aqad Harvest Source: https://edg.epa.gov/WAFer_harvest/FGDC/epa-national-emissions-inventory-region-1-2011.xml

From: Kishore K. Vuppala [mailto:notifications@github.com] Sent: Wednesday, April 26, 2017 11:08 AM To: GSA/ckanext-geodatagov ckanext-geodatagov@noreply.github.com Cc: Hultgren, Torrin Hultgren.Torrin@epa.gov; Mention mention@noreply.github.com Subject: Re: [GSA/ckanext-geodatagov] Issues with DateStamp transformation (#104)

@amilan17https://github.com/amilan17 thank you, that seem to be the case.

@charness39https://github.com/charness39 @torrin47https://github.com/torrin47 I see that publication date is parsed correctly on the example https://catalog.data.gov/dataset/national-emissions-inventory-u-s-2014-epa-oar-oaqps-aqad , would you be able to update the metadata date and try reharvesting?

We will look into the other two records where the ISO / FGDC metadata links are not getting displayed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/GSA/ckanext-geodatagov/issues/104#issuecomment-297439461, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACB0vEZwA_1SB5sfJ1t3mOb__0jsAa_Gks5rz13ggaJpZM4LBt1i.

FuhuXia commented 7 years ago

WAF harvester looks for timestamp for each xml file to determine whether update is needed. The file epa-national-emissions-inventory-region-1-2011.xml has a timestamp of 6/10/2016 7:15 AM, therefore it is skipped by harvester.

https://edg.epa.gov/WAFer_harvest/FGDC/

torrin47 commented 7 years ago

Ok. We have code that ensures that the file timestamp and the embedded last update date actually reflect the point in time when the metadata was last edited in our database. The recent fix just changed the syntax of the listed date, but continued to align it with the datestamp in the database. We’ll set all dates to today to force a full, comprehensive update of every CSDGM record, then revert to the previous approach once the records are all updated.

From: Fuhu Xia [mailto:notifications@github.com] Sent: Wednesday, April 26, 2017 5:14 PM To: GSA/ckanext-geodatagov ckanext-geodatagov@noreply.github.com Cc: Hultgren, Torrin Hultgren.Torrin@epa.gov; Mention mention@noreply.github.com Subject: Re: [GSA/ckanext-geodatagov] Issues with DateStamp transformation (#104)

WAF harvester looks for timestamp for each xml file to determine whether update is needed. The file epa-national-emissions-inventory-region-1-2011.xml has a timestamp of 6/10/2016 7:15 AM, therefore it is skipped by harvester.

https://edg.epa.gov/WAFer_harvest/FGDC/

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/GSA/ckanext-geodatagov/issues/104#issuecomment-297543133, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACB0vE_6RTy1avoG_4rP_ACyoQbkgsjiks5rz7OsgaJpZM4LBt1i.