Closed mbjones closed 7 years ago
Original Redmine Comment Author Name: Margaret O'Brien (Margaret O'Brien) Original Date: 2008-09-22T19:20:21Z
targeting for 2.1.0, although may drop back to unspecified.
Original Redmine Comment Author Name: Margaret O'Brien (Margaret O'Brien) Original Date: 2008-11-04T22:09:26Z
The pattern for this type will be something resembling:
I am assuming that we still want to allow newlines in strings, and the dot (.) specifically does not match these. At least some current xs:strings have these (e.g.
Original Redmine Comment Author Name: Margaret O'Brien (Margaret O'Brien) Original Date: 2008-11-08T20:56:54Z
We need to look at the effect on instance documents of switching all xs:string to NonEmptyStringType. This type-switch will probably have a bigger effect on the ability of authors to migrate their documents than the changes to the document structure itself. Structure changes will be accomplished by the xsl stylesheet, but retyping all strings means that content could now be required where none previously existed.
To start, I considered just the anonymous simple type elements that are required by EML and are type="xs:string". It seemed reasonable that if an element was optional, that its content could also be optional. In all, there are 81 of these, which are generally easy to retype with a statement like: sed -e '/\<xs:element\ name/{ /minOccurs=\"0\"/!s/xs:string/res:NonEmptyStringType/ } '
There are other elements which could be examined and retyped manually, or would
be caught by a general s/xs:string/res:NonEmptyStringType/ E.g., see
So to start, only one schema file, "eml-resource.xsd", has been checked into CVS, so that others can try out the effect of NonEmptyStringType while its scope is small. Particularly, I was thinking about Morpho. 7 element declarations occur in this file that were formerly of xs:string, and now are NonEmptyStringType. See the list below. I think that Morpho wizards deal with only title, references and keyword, although any are available in the tree editor. My local copy has all 81 (anonymous, simple) element declarations retyped (in 17 schema docs), plus the 5 anonymous attributes. I am testing a variety of EML201 documents from the LTER metacat against this schema as I convert them -- basically while I work on the XSL stylesheet.
title distribution/connectionDefinition/parameterDefinition/name distribution/connectionDefinition/parameterDefinition/description distribution/connection/parameter/name distribution/connection/parameter/description distribution/offline/MediumName references (multiple paths) keyword (a named type)
Original Redmine Comment Author Name: Jing Tao (Jing Tao) Original Date: 2008-11-12T00:04:19Z
I checked the morpho code and we use those three path at new package wizard. title distribution/offline/MediumName keyword (a named type)
Morpho also checks if the the input is a empty string. If it's, morpho will ask user to input something there.
Original Redmine Comment Author Name: Margaret O'Brien (Margaret O'Brien) Original Date: 2008-11-22T00:55:29Z
The optional elements have had their xs:strings retyped to res:NonEmptyStringType.
Original Redmine Comment Author Name: Redmine Admin (Redmine Admin) Original Date: 2013-03-27T21:20:26Z
Original Bugzilla ID was 2512
Author Name: Matt Jones (Matt Jones) Original Redmine Issue: 2512, https://projects.ecoinformatics.org/ecoinfo/issues/2512 Original Date: 2006-08-15 Original Assignee: Matt Jones
Current EML schemas allow text content to be empty, which defeats validation rules by allowing users to provide content such as:
I propose that these uses of empty strings should not be valid. We can acheive this by redefining the datatype we use for strings to have a minimum length of 1 and a pattern that requires some non-whitespace characters.
In XML Schema, we can declare the element to be of type eml:nonemptystring where eml:nonemptystring is a simple type derived from xs:string like this:
I'm not sure if that regular expression quite gets what we want, but it is close and would need some testing. It is intended to sleect (zero or more whitespace characters) followed by (one or more non-whitespace characters) followed by (any additional characters). We probably could remove the plus symbol as its redundant with the subsequent .*