require text content in elements to be non-empty - Githubissues

mbjones commented 7 years ago

Author Name: Matt Jones (Matt Jones) Original Redmine Issue: 2512, https://projects.ecoinformatics.org/ecoinfo/issues/2512 Original Date: 2006-08-15 Original Assignee: Matt Jones

Current EML schemas allow text content to be empty, which defeats validation rules by allowing users to provide content such as:

I propose that these uses of empty strings should not be valid. We can acheive this by redefining the datatype we use for strings to have a minimum length of 1 and a pattern that requires some non-whitespace characters.

In XML Schema, we can declare the element to be of type eml:nonemptystring where eml:nonemptystring is a simple type derived from xs:string like this:

I'm not sure if that regular expression quite gets what we want, but it is close and would need some testing. It is intended to sleect (zero or more whitespace characters) followed by (one or more non-whitespace characters) followed by (any additional characters). We probably could remove the plus symbol as its redundant with the subsequent .*

mbjones commented 7 years ago

Original Redmine Comment Author Name: Margaret O'Brien (Margaret O'Brien) Original Date: 2008-09-22T19:20:21Z

targeting for 2.1.0, although may drop back to unspecified.

mbjones commented 7 years ago

Original Redmine Comment Author Name: Margaret O'Brien (Margaret O'Brien) Original Date: 2008-11-04T22:09:26Z

The pattern for this type will be something resembling:

I am assuming that we still want to allow newlines in strings, and the dot (.) specifically does not match these. At least some current xs:strings have these (e.g. in test/eml-datasetWithCitation.xml). need to test against some docs with \r\n as well</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/mbjones"><img src="https://avatars.githubusercontent.com/u/766407?v=4" />mbjones</a> commented <strong> 7 years ago</strong> </div> <div class="markdown-body"> <hr /> <p>Original Redmine Comment Author Name: <strong>Margaret O'Brien</strong> (Margaret O'Brien) Original Date: 2008-11-08T20:56:54Z</p> <hr /> <p>We need to look at the effect on instance documents of switching all xs:string to NonEmptyStringType. This type-switch will probably have a bigger effect on the ability of authors to migrate their documents than the changes to the document structure itself. Structure changes will be accomplished by the xsl stylesheet, but retyping all strings means that content could now be required where none previously existed. </p> <p>To start, I considered just the anonymous simple type elements that are required by EML and are type="xs:string". It seemed reasonable that if an element was optional, that its content could also be optional. In all, there are 81 of these, which are generally easy to retype with a statement like: sed -e '/\<xs:element\ name/{ /minOccurs=\"0\"/!s/xs:string/res:NonEmptyStringType/ } ' </p> <p>There are other elements which could be examined and retyped manually, or would be caught by a general s/xs:string/res:NonEmptyStringType/ E.g., see <keyword> (eml-resource.xsd) -- a complexType/simpleContent, so the reference to xs:string occurs below the element declaration. Other elements (and many attributes) use xs:restriction base="xs:string" as the start of an enumeration list, but changing these to base="NonEmptyStringType" seems superfluous.</p> <p>So to start, only one schema file, "eml-resource.xsd", has been checked into CVS, so that others can try out the effect of NonEmptyStringType while its scope is small. Particularly, I was thinking about Morpho. 7 element declarations occur in this file that were formerly of xs:string, and now are NonEmptyStringType. See the list below. I think that Morpho wizards deal with only title, references and keyword, although any are available in the tree editor. My local copy has all 81 (anonymous, simple) element declarations retyped (in 17 schema docs), plus the 5 anonymous attributes. I am testing a variety of EML201 documents from the LTER metacat against this schema as I convert them -- basically while I work on the XSL stylesheet.</p> <p>title distribution/connectionDefinition/parameterDefinition/name distribution/connectionDefinition/parameterDefinition/description distribution/connection/parameter/name distribution/connection/parameter/description distribution/offline/MediumName references (multiple paths) keyword (a named type)</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/mbjones"><img src="https://avatars.githubusercontent.com/u/766407?v=4" />mbjones</a> commented <strong> 7 years ago</strong> </div> <div class="markdown-body"> <hr /> <p>Original Redmine Comment Author Name: <strong>Jing Tao</strong> (Jing Tao) Original Date: 2008-11-12T00:04:19Z</p> <hr /> <p>I checked the morpho code and we use those three path at new package wizard. title distribution/offline/MediumName keyword (a named type)</p> <p>Morpho also checks if the the input is a empty string. If it's, morpho will ask user to input something there.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/mbjones"><img src="https://avatars.githubusercontent.com/u/766407?v=4" />mbjones</a> commented <strong> 7 years ago</strong> </div> <div class="markdown-body"> <hr /> <p>Original Redmine Comment Author Name: <strong>Margaret O'Brien</strong> (Margaret O'Brien) Original Date: 2008-11-22T00:55:29Z</p> <hr /> <p>The optional elements have had their xs:strings retyped to res:NonEmptyStringType.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/mbjones"><img src="https://avatars.githubusercontent.com/u/766407?v=4" />mbjones</a> commented <strong> 7 years ago</strong> </div> <div class="markdown-body"> <hr /> <p>Original Redmine Comment Author Name: <strong>Redmine Admin</strong> (Redmine Admin) Original Date: 2013-03-27T21:20:26Z</p> <hr /> <p>Original Bugzilla ID was 2512</p> </div> </div> <div class="page-bar-simple"> </div> <div class="footer"> <ul class="body"> <li>© <script> document.write(new Date().getFullYear()) </script> Githubissues.</li> <li>Githubissues is a development platform for aggregating issues.</li> </ul> </div> <script src="https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js"></script> <script src="/githubissues/assets/js.js"></script> <script src="/githubissues/assets/markdown.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/highlight.min.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/languages/go.min.js"></script> <script> hljs.highlightAll(); </script> </body> </html>

NCEAS / z-test-issues

require text content in elements to be non-empty #406