ISO-TC211 / XML

XML schema, transforms, schematron rules, and examples for ISO TC211 Metadata Standards
46 stars 26 forks source link

Implementation of 19103 Date in gco namespace #223

Open rmalyankar opened 2 years ago

rmalyankar commented 2 years ago

In the implementation of ISO 19103 Date in the gco namespace baseTypes 1.2.0 Date is implemented as an union: `

</xs:simpleType>`

This allows a mistake in interpretation and encoding with values like 19990101, which is (a) intended as a date in ISO basic format, but (b) permitted by the above implementation as a year-only date, i.e., the year 19,990,101. This means an invalid date like 19999999 (basic format) is not detected by schema validation because it is a valid year-only date though it was intended as a yyyymmdd date.

I suggest making Date_Type a complex type, a choice of XML elements named according to the XML schema built-in date types. Here is a related example from S-100. (It is for truncated date, which is slightly different in using more XML Schema built-in types, but it should convey the idea.)

DateFormat

The result would be like this:

Application schema: <xs:element name="dateStart" type="S100_TruncatedDate" ...

Dataset: <dateStart><gMonthDay>--06-01</gMonthDay></dateStart>

(I am posting this as an XML issue because I think it is just an implementation change and shouldn't need a change to the ISO 19103 standard, but feel free to move it to StandardsTracker if appropriate.)

ejbleys commented 2 years ago

Thanks This might offer a potential solution to the other question of months in any year. Happy to put it on the “Other Business” in the AG 10 agenda in Vienna This fits in with another discussion on alternatives to using gco/gcx elements.

As for the validity of a date - that is not necessarily a schema issue (the values within tags is rarely checked) and dates of either type might be desired Do I like yyyymmdd dates - not for some time now (though I was known for using them) - could use Schematron to identify those issues

Happy test options for a 19103 where we amalgamate current and alternative expressions.

Cheers E Evert Bleys 4 Tudor Place HUGHES ACT 2605 Australia email: @. Mob: +61 (0)411 483 876 Land: +61 (0)2 6281 1773 Skype: @.

On 2022-04-21, at 3:20 am, rmalyankar @.***> wrote:

In the implementation of ISO 19103 Date in the gco namespace baseTypes 1.2.0 https://github.com/ISO-TC211/XML/blob/master/schemas.isotc211.org/19103/-/gco/1.2.0/baseTypes.xsd Date is implemented as an union:

This allows a mistake in interpretation and encoding with values like 19990101, which is (a) intended as a date in ISO basic format, but (b) permitted by the above implementation as a year-only date, i.e., the year 19,990,101. This means an invalid date like 19999999 (basic format) is not detected by schema validation because it is a valid year-only date though it was intended as a yyyymmdd date. I suggest making Date_Type a complex type, a choice of XML elements named according to the XML schema built-in date types. Here is a related example from S-100. (It is for truncated date, which is slightly different in using more XML Schema built-in types, but it should convey the idea.) The result would be like this: Application schema: --06-01 (I am posting this as an XML issue because I think it is just an implementation change and shouldn't need a change to the ISO 19103 standard, but feel free to move it to StandardsTracker if appropriate.) — Reply to this email directly, view it on GitHub , or unsubscribe . You are receiving this because you are subscribed to this thread.
rmalyankar commented 2 years ago

Thanks, much appreciated.

I don't like yyyymmdd date formats either. The XML built-in types, which use separators, work fine. yyyy-mm-dd works fine. The problem is that people convert data from non-ISO formats and overlook date format conversion, or come from legacy systems and wrongly try to create date values without separators, and the current "union" type provides no hint that they may be making a mistake.

PeterParslow commented 2 years ago

Rather than making the XML type complex, would it be tidier to validate this with a schematron pattern (requiring the separators)?

This is also an issue that should be flagged / solved at the logical level - in this case, in ISO 19103, but see also the ongoing Ad hoc group on representing time - e.g. by defining specifically there what subset of ISO 8601 is allowed for use in ISO/TC 211 data.

PeterParslow commented 2 years ago

Whilst I do agree with pushing our users towards using separators, we should note that ISO 8601:2004 only allows more than four digits in the year "with mutual agreement" (Clause 3.5), and such extended year values need to start with either + or -.

So anyone interpreting 19990101 as a year only is not following ISO 8601 (or the widely used RFC 3339, which only allows four digit years).

That said, it's possible that ISO 8601-2:2019 changed this - I haven't got a copy.

PeterParslow commented 2 years ago

Coincidentally, BSI have just given me access to ISO 8601-2:2019. That adds another option for years with > 4 digits, allowing a prefix of "Y" - so if someone is stating 19,990,101 CE (AD) in ISO 8602-2:2019, they are allowed to say Y19990101 without any prior arrangement. 19990101 remains unambiguous.

rmalyankar commented 2 years ago

I think schema-validation (i.e., using types in the XML schema, whether built-in or user-defined) is generally better than Schematron rules (when there is a choice between the two), because schema-validation happens earlier in the process. Also, as a practical matter, developers are less prone to skimp schema validation than application of Schematron rule files.

Years with more than 4 digits would be an error all right in an S-100 context, but they're starting with 8-digit data fields (yyyymmdd) and the idea is to trap and signal errors during data conversion or data entry.

ejbleys commented 2 weeks ago

I believe this issue can be readily addressed in a review of ISO 19115-1. At the moment YYYY is a valid input for Date, so if one were to add the extra digits it would imply that the value is more precise than a year

PeterParslow commented 2 weeks ago

I stand by my previous statement that ISO 19103:2015 ties the Date type to ISO 8601 which only allows for 4-digit years. ISO 19103:2024 refers to the abstract/logical types in ISO/IEC 11404 (recognising that ISO 8601 is about representation) - Having read it a couple of times, I think that requires all representations of year to conform to ISO 8601 anyway, so therefore only have 4 digits.

The XML schema maps this to xs:gYear. W3C define that (https://www.w3.org/TR/xmlschema11-2/#gYear) with a regular expression that allows 1-4 digits.

Taken together, I can't see anyway in which 19990101 can be interpreted as an eight digit year, because in all these standards, years have a maximum of four digits.

Sounds like a problem that can wait almost 8000 years before needing resolving. Of course, some software may not be implementing the standards correctly.... (Or perhaps I'm misreading them: if anyone can show me how ISO 8601, ISO/IEC 11404 or W3C xs:gYear can support > 4 digits, I'm open to that!)

ejbleys commented 2 weeks ago

TickEvert Bleys4 Tudor PlaceHUGHES ACTAustraliaMob: 0411 483 876On 7 Nov 2024, at 12:45, Peter Parslow @.***> wrote: I stand by my previous statement that ISO 19103:2015 ties the Date type to ISO 8601 which only allows for 4-digit years. ISO 19103:2024 refers to the abstract/logical types in ISO/IEC 11404 (recognising that ISO 8601 is about representation) - Having read it a couple of times, I think that requires all representations of year to conform to ISO 8601 anyway, so therefore only have 4 digits. The XML schema maps this to xs:gYear. W3C define that (https://www.w3.org/TR/xmlschema11-2/#gYear) with a regular expression that allows 1-4 digits. Taken together, I can't see anyway in which 19990101 can be interpreted as an eight digit year, because in all these standards, years have a maximum of four digits. Sounds like a problem that can wait almost 8000 years before needing resolving. Of course, some software may not be implementing the standards correctly.... (Or perhaps I'm misreading them: if anyone can show me how ISO 8601, ISO/IEC 11404 or W3C xs:gYear can support > 4 digits, I'm open to that!)

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

PeterParslow commented 2 weeks ago

ISO 8601-1:2016 does allow >4 digits, but only if you flag the 'string' as 'all year' by starting it with Y