IHO-S100WG / TSM8

4 stars 1 forks source link

6.X-5 metadata temporal attributes #13

Closed rmalyankar closed 2 years ago

rmalyankar commented 3 years ago

Proposal to add discovery metadata attributes to indicate temporal validity of dataset and availability of successor dataset. From S-104.

See the proposal for details.

Upated 25 Oct

Revised draft has just been uploaded, see the 25 October rev1 PDF on the proposal page.

rmalyankar commented 2 years ago

Spin-off thread for "closure" attribute in rev2 proposal here: #16

DavidGrant-NIWC commented 2 years ago

Recommend limitations of the proposed schema be noted in the proposal.

Item 1

ITEM 2

ITEM 3

rmalyankar commented 2 years ago

Recommend limitations of the proposed schema be noted in the proposal.

Item 1

  • XML dateTime format differs from ISO 8601:2004 and S-100 DateTime

    • What is the value of adding the XML format?

    • Recommend use the existing S-100 DateTime type for timeInstantBegin/End.

    • S-100 DateTime is restricted to complete representation, basic format.

The ISO 8601 basic representation (20211019) is not an XML Schema built-in datatype and cannot be validated by XML Schema validation. XML Schema uses the ISO 8601 Extended Format. That format is also defined in ISO 8601, so the metadata is using an ISO format. See also S-100 3-9 (I can expand that clause to cover all date and time types and propose adding it to Part 1 too if desired).

  • I think a spatio-temporal extent should be used, in harmony with Jonathan's proposal.

    • Supports use of TruncatedDate
    • Overcomes most limitations noted below.
  • Strongly recommend adding the temporalExtent attribute to the dataCoverage rather than directly in S100_DatasetDiscoveryMetadata.

    • Note that a geographic extent is associated with each dataCoverage, not with each dataset.
    • LIMITATION: if left as proposed, there's no way to determine the interval of a specific dataCoverage without downloading the data and examining the encoded dataset.
  • Only a single temporal extent can be associated with the data (upper multiplicity is 1 vice *).

Truncated date is not needed for temporal extent in metadata (and if S-100 uses it in teporal extent the ISO 19108 types will have to be replaced anyway):

  1. Datasets that are seasonal can be re-issued or have new editions published annually. And probably should, to ensure they are up-to-date.
  2. Are there realistic cases of datasets that have date or time periodicity, rather than features? Why not issue one dataset containing features with date and/or time periodicity? We already have periodicDateRange, easy enough to define a similar complex attribute for time too.

Placing temporal extent at the dataset discovery level is a plus. Placing it only in sub-dataset "coverages" complicates data management on both production and user system sides, and will complicate S-128 too. Also, the metadata group moved the geographic component boundingBox up from dataCoverage this summer (i.e., from class _S100DataCoverage to _S100DatasetDiscoveryMetadata), so it makes sense to put temporal extent for the dataset at the same level as bounding box. I suppose we could add temporal extent to _S100DataCoverage too, by analogy to boundingPolygon for the likely rare situations where it is different from that for the dataset. Can someone supply plausible examples for using it in _S100DataCoverage where it is different from the temporal extent of the dataset as a whole?

  • LIMITATION: Imposes discovery limitations, e.g., for seasonal publications (but not sure if there are / will be any of these types of datasets).

    • For example, a dataset that is intended to be used Jan-Mar and Jul-Sep
  • Note that there can be multiple boundingPolygons associated with each dataCoverage.

    • It would be nice if each could have its own temporal extent.
    • Since an instant (fully qualified date and time) must be specified for the begin and/or end:
  • LIMITATION: recurring instants such as "every Wednesday", or "the first of every month" can't be specified.

    • Strongly recommend this be noted in the proposal.
  • LIMITATION: If the dataset is to be used at a certain time of day, it will be necessary to update the metadata every day

    • e.g. 5pm-7pm can't be represented without also specifying a date.
    • To provide updates to the discovery metadata, datasets must be published at the interval at which they could be replaced, even if nothing changes.
  • LIMITATION: Suppose we want to publish two datasets, one for use M-F and another for use Sa-Su. The current schema requires updating the metadata every week, even if no data changes.

See replies above. Are there plausible use cases for datasets (as opposed to features) that are valid from 5-7 pm every day or only Mon-Fri or Saturday & Sunday?

ITEM 2

  • Recommend using an instant vice a duration as noted in justification (3) of the proposal image

    • Duration is not used anywhere else in S-100

    • As proposed, every receiving system must compute the instant instead of computing it once at origination

    • Encoding software can automatically populate the instant.

    • No more error prone then populating the issue date.

As the proposal explains, using duration is less susceptible to human error and allows more fine tuning than date or date-time. Duration is also an ISO type. In fact, given that some events should be triggered X time before another (e.g., notices due X hours before arrival), adding duration to Part 1 would be a good idea.

ITEM 3

  • No need to add XML schema reference if proposal is updated to change both:

    • Item 1 timeInstantBegin/End type to DateTime
    • Item 2 duration is changed to an instant
DavidGrant-NIWC commented 2 years ago

The ISO 8601 basic representation (20211019) is not an XML Schema built-in datatype and cannot be validated by XML Schema validation. XML Schema uses the ISO 8601 Extended Format. That format is also defined in ISO 8601, so the metadata is using an ISO format. See also S-100 3-9 (I can expand that clause to cover all date and time types and propose adding it to Part 1 too if desired).

[...] Also, the metadata group moved the geographic component boundingBox up from dataCoverage this summer (i.e., from class S100_DataCoverage to S100_DatasetDiscoveryMetadata), so it makes sense to put temporal extent for the dataset at the same level as bounding box. I suppose we could add temporal extent to S100_DataCoverage too, by analogy to boundingPolygon for the likely rare situations where it is different from that for the dataset. Can someone supply plausible examples for using it in S100_DataCoverage where it is different from the temporal extent of the dataset as a whole?

I'm fine with adding it to both dataset and dataCoverage.

Datasets containing multiple disjoint (in space and/or time) coverages should be considered. Moving boundingBox makes the attribute less useful in these cases, but it probably doesn't matter because systems will just look at the required boundingPolygon(s).

A single date-dependent feature in a single dataCoverage can affect the temporal extent of the dataset, but does not necessarily affect the temporal extent of other dataCoverages. This may or may not be useful for optimizing data delivery, storage, and retrieval mechanisms.

As the proposal explains, using duration is less susceptible to human error

We disagree on this point.

and allows more fine tuning than date or date-time.

Sub-second resolution is not likely to be of any benefit, at least wrt to discovery metadata.

Duration is also an ISO type. In fact, given that some events should be triggered X time before another (e.g., notices due X hours before arrival), adding duration to Part 1 would be a good idea.

rmalyankar commented 2 years ago

The ISO 8601 basic representation (20211019) is not an XML Schema built-in datatype and cannot be validated by XML Schema validation. XML Schema uses the ISO 8601 Extended Format. That format is also defined in ISO 8601, so the metadata is using an ISO format. See also S-100 3-9 (I can expand that clause to cover all date and time types and propose adding it to Part 1 too if desired).

  • Use of an ISO format is not sufficient.
  • Use of alternate formats defined in 8601 is not allowed.
  • As you note, S-100 3-9 does not apply.

    • Even if updated, a mapping would need to be provided between the S-100 and XML types.
  • The fact that there is no XML built-in datatype type for an S-100 Date / DateTime is beside the point. There is also no built-in for TruncatedDate (or S-100 DateTime)

    • The restriction to use complete representation, basic format is not optional.

    • The value can be encoded in an XML dateTime, but the document needs to be clear that the lexical representation must conform to the S-100 DateTime type, it cannot use the entire lexical space of the XML type.

    • Hence my recommendation to show as S-100 DateTime in the UML.

    • The schema can implement using XML dateTime.

So let's see if I understand what you're saying. Your only objection is that the UML and table in the proposal say "dateTime" instead of "DateTime"? The schemas can continue to encode date and date-time types as the XML built-in types xs:date and xs:dateTime?

[...] Also, the metadata group moved the geographic component boundingBox up from dataCoverage this summer (i.e., from class S100_DataCoverage to S100_DatasetDiscoveryMetadata), so it makes sense to put temporal extent for the dataset at the same level as bounding box. I suppose we could add temporal extent to S100_DataCoverage too, by analogy to boundingPolygon for the likely rare situations where it is different from that for the dataset. Can someone supply plausible examples for using it in S100_DataCoverage where it is different from the temporal extent of the dataset as a whole?

I'm fine with adding it to both dataset and dataCoverage.

Datasets containing multiple disjoint (in space and/or time) coverages should be considered. Moving boundingBox makes the attribute less useful in these cases, but it probably doesn't matter because systems will just look at the required boundingPolygon(s).

A single date-dependent feature in a single dataCoverage can affect the temporal extent of the dataset, but does not necessarily affect the temporal extent of other dataCoverages. This may or may not be useful for optimizing data delivery, storage, and retrieval mechanisms.

As the proposal explains, using duration is less susceptible to human error

We disagree on this point.

and allows more fine tuning than date or date-time.

Sub-second resolution is not likely to be of any benefit, at least wrt to discovery metadata.

I wasn't talking about sub-second resolution. From the proposal: Zero components must be encoded if and only if they are significant for indicating the granularity of the start/end instants of the interval. A variation of ±X should be allowed for, where X is the component of smallest granularity.

Duration is also an ISO type. In fact, given that some events should be triggered X time before another (e.g., notices due X hours before arrival), adding duration to Part 1 would be a good idea.

  • The ISO model doesn't use duration in this exact use case (dataset publication interval).
  • Durations relative to instants are themselves specifying instants.

    • I'm not opposed to adding duration to Part 1, but recommend that it be a separate proposal, or part of Jonathan's proposal. It is not needed by this proposal.

We discussed the ISO use of duration in this use case during the metadata group meeting when Julia showed my slide with the ISO 19115-1 conceptual model for maintenance information. The ISO model of maintenance information does allow use of duration, specifically the ISO type TM_PeriodDuration.

DavidGrant-NIWC commented 2 years ago

We discussed the ISO use of duration in this use case during the metadata group meeting when Julia showed my slide with the ISO 19115-1 conceptual model for maintenance information. The ISO model of maintenance information does allow use of duration, specifically the ISO type TM_PeriodDuration.

As I indicated at that time, the ISO model:

maintenanceAndUpdateFrequency userDefinedMaintenanceFrequency dateOfNextUpdate
daily - -
continual P0Y0M0DT6H0M0S -
continual - 20211201
continual - 20211201T101530Z
continual P0Y0M0DT6H0M0S 20211201T101530Z
irregular - 20220501

Strongly recommend using the ISO model instead of the non-standard encoding you are proposing.


Your only objection is that the UML and table in the proposal say "dateTime" instead of "DateTime"? The schemas can continue to encode date and date-time types as the XML built-in types xs:date and xs:dateTime?

Yes - it should say DateTime to clarify to encoders that they are restricted to the S-100 format. The schema can use xs:dateTime because it can hold all possible values of the S-100 encoding. This is similar to the schema using CharacterString to encode many more restrictive types.

I wasn't talking about sub-second resolution. From the proposal: Zero components must be encoded if and only if they are significant for indicating the granularity of the start/end instants of the interval. A variation of ±X should be allowed for, where X is the component of smallest granularity.

rmalyankar commented 2 years ago

About the ISO model of _MDMaintenanceInformation:

Strongly recommend using the ISO model instead of the non-standard encoding you are proposing.

Where does the ISO model mandate inclusion of the maintenanceAndUpdateFrequency attribute? The model only requires at least one of maintenanceAndUpdateFrequency OR userDefinedMaintenanceFrequency.

As I indicated during the web meeting, I was wiiling to change the name datasetDeliveryInterval to userDefinedMaintenanceFrequency but remember being told that wasn't important. Anyway, the name userDefinedMaintenanceFrequency is ambiguous. Who is the user who defined the frequency, producer, distributor, or the end user?

  • Use MD_MaintenanceInformation

    • Simplifies the proposal.
  • Otherwise, provide a rational for replacing MD_MaintenanceInformation

Your only objection is that the UML and table in the proposal say "dateTime" instead of "DateTime"? The schemas can continue to encode date and date-time types as the XML built-in types xs:date and xs:dateTime?

Yes - it should say DateTime to clarify to encoders that they are restricted to the S-100 format. The schema can use xs:dateTime because it can hold all possible values of the S-100 encoding. This is similar to the schema using CharacterString to encode many more restrictive types.

I'll change the types in the UML and table, but encoders who create XML files (or rather, the software used to create XML files) will obviously need to conform to the XML built-in types. UIs should be using calendar widgets anyway, that shouldn't make any difference to human users.

On variations:-

I'm going to wait and see if anyone else has views on allowing (or not) for encoding variations in expected delivery intervals, and if so, how to define allowable variation. There is a request for comments in the draft of the proposal. (Negative variations allow for early releases, which are a possibility for datasets that are refreshed/replaced monthly or annually, like tidal predictions derived from harmonic analysis.)

rmalyankar commented 2 years ago

This is what adhering more closely to the ISO model would do, and I can change the proposal accordingly to use either date-time or period (leaving the possibility of adding a "variation" attribute open for later - I want input from OEMs about that):

In S100_DatasetDiscoveryMetadata:

           <mri:resourceMaintenance>
                <mmi:MD_MaintenanceInformation>
                    <!-- S-100 restricts attributes to either:
                            (1) maintenanceDate + maintenanceAndUpdateFrequency=asNeeded or
                            (2) userDefinedMaintenanceFrequency -->
                    <mmi:maintenanceAndUpdateFrequency>
                        <mmi:MD_MaintenanceFrequencyCode codeList="http://...." codeListValue="asNeeded">
                            content: empty, or any text in any single language, but not multilingual text,
                            ISO schemas don't allow it here.
                            Included to satisfy the ISO 19115-1 constraint that at least one of
                            userDefinedMaintenanceFrequency or maintenanceAndUpdateFrequency be present.</mmi:MD_MaintenanceFrequencyCode>
                    </mmi:maintenanceAndUpdateFrequency>
                    <mmi:maintenanceDate>
                        <cit:CI_Date>
                            <cit:date>
                                <gco:DateTime>2022-01-01T05:00:00Z</gco:DateTime><!-- or 2022-01-01T05:00:00, or 2022-01-01T05:00:00-05:00 -->
                            </cit:date>
                            <cit:dateType>
                                <cit:CI_DateTypeCode codeList="http://..." codeListValue="nextUpdate">empty, or any text in any single language, but not multilingual text, ISO schemas don't allow it here</cit:CI_DateTypeCode>
                            </cit:dateType>
                        </cit:CI_Date>
                    </mmi:maintenanceDate>
                    <mmi:userDefinedMaintenanceFrequency>
                        <gco:TM_PeriodDuration>PT06H</gco:TM_PeriodDuration>
                    </mmi:userDefinedMaintenanceFrequency>
                </mmi:MD_MaintenanceInformation>
            </mri:resourceMaintenance>`

(mutatis mutandis, w.r.t the mri and mmi prefixes).

In my opinion the ISO constraint should be: "at least one of the 3 attributes maintenanceAndUpdateFrequency, maintenanceDate, or userDefinedMaintenanceFrequency", or better yet, "at least one of maintenanceDate or userDefinedMaintenanceFrequency" but if the ISO constraint is a must-have-as-is, there it is.

rmalyankar commented 2 years ago

Updated to use the ISO model of maintenance information, but this is now a more complex proposal than merely adding the datasetDeliveryInterval (or userDefinedMaintenanceFrequency) attribute directly to _S100DatasetDiscoveryMetadata...

DateTime vice dateTime type in S100_TemporalExtent also applied.

rmalyankar commented 2 years ago

Proposal has been reorganized to make the large block of textual material following MD_MaintenanceInformation documentation table a new clause.