Schematron / schematron-enhancement-proposals

This repository collects proposals to enhance Schematron beyond the ISO specification
7 stars 0 forks source link

Define a textual format for Schematron #22

Open tgraham-antenna opened 2 years ago

tgraham-antenna commented 2 years ago

Is it time for a textual format for Schematron?

XML people have no problem with pointy brackets, but much of the rest of the world doesn't feel the same way about them. One of the advantages of Schematron is that people can write their own error messages, but why should they have to use XML to do so?

A textual format for Schematron could make Schematron more straightforwardly usable by more people.


jsontron (https://amer-ali.github.io/jsontron/) is a Schematron-like textual format in JSON. A seemingly simple but probably not very good solution would be to adopt that but use XPath assertions.

I have hopes that XML-in-KDL (XiK) (https://github.com/kdl-org/kdl/blob/main/XML-IN-KDL.md) could be a useful textual representation for some XML applications, but I've run out of time right now to try it out.

If an EBNF for a custom syntax can be developed, then it's likely that a parser for it could be generated. There are also precedents of syntaxes that are defined by textual descriptions of what to do at every possible token.

rjelliffe commented 2 years ago

I once made a front end in HTML, so that provided you used certain conventions, you could make your schema as a table. There is a remnant of this (just for the rich text) at https://github.com/Schematron/schematron/blob/master/trunk/converters/code/ToSchematron/xhtml2sch.xsl

I agree that it is good to have all sorts of levels of support.

I like simplified formats, but they can be a "if you only have a hammer, everything is a nail" solution: is the real issue formats, or familiarity/interaction modes for non-pointy people? Here are two alternative suggestions:

1. SPREADSHEET:

Spreadsheets give more guidance, and is a super-friendly technology for people who wear ties. Plus it can suit the division of labour better. I think Schematron is quite amenable to spreadsheet representation\, because all key elements each only appear at one, quite shallow, nesting level.

For example

2. DSL/NL interface

Or perhaps the problem is not so much pointy brackets but XPath? What might be useful would be a language-like way to express simple rules, rather like SQL. To enable this, it might be better to get a vocabulary of keywords for important tests, rather like OASIS CAM, for example ONE-ONLY.

It might be better to put this into a document-aware NL interface (ELIZA for XML = ELIXIR?). So, for example, the human writes: A CAT CAN ONLY HAVE KITTENS and the system responds withan Eliza-like A CAT (eg:cat) CAN ONLY HAVE (unique-child) KITTENS (eg:kitten) which the user can then adjust if they want A CAT (eg:cat) CAN ONLY HAVE (unique-child) KITTENS (eg:kitten) OR MICE which the system responds with Do you mean: A CAT CAN ONLY HAVE KITTENS OR MICE (BUT NOT BOTH) or A CAT CAN ONLY HAVE KITTENS AND MICE?

Is the problem requirements REPRESENTATION for non-pointy people, or requirements extraction in the absence of a guru?

Cheers Rick

On Sat, Sep 25, 2021 at 5:59 AM Tony Graham @.***> wrote:

Is it time for a textual format for Schematron?

XML people have no problem with pointy brackets, but much of the rest of the world doesn't feel the same way about them. One of the advantages of Schematron is that people can write their own error messages, but why should they have to use XML to do so?

A textual format for Schematron could make Schematron more straightforwardly usable by more people.

jsontron (https://amer-ali.github.io/jsontron/) is a Schematron-like textual format in JSON. A seemingly simple but probably not very good solution would be to adopt that but use XPath assertions.

I have hopes that XML-in-KDL (XiK) ( https://github.com/kdl-org/kdl/blob/main/XML-IN-KDL.md) could be a useful textual representation for some XML applications, but I've run out of time right now to try it out.

If an EBNF for a custom syntax can be developed, then it's likely that a parser for it could be generated. There are also precedents of syntaxes that are defined by textual descriptions of what to do at every possible token.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Schematron/schematron-enhancement-proposals/issues/22, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF65KKP7IETGNBVVRAUVTN3UDTKCTANCNFSM5EWUMRDQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

xatapult commented 2 years ago

Maybe something along the lines of RelaxNG with its text format?

michael-aka-mmh commented 2 years ago

From my POV Schematron is used to check XML and the people defining the rules are therefore not completely anti-XML. But a regular XML editor allows such users to enter the most relevant information in element content, not in attributes.

So I see the current use of having the rules in an attribute as the main pain point. All other pointy angles stuff could be hidden by a suitably configured XML editor.

rjelliffe commented 2 years ago

I see #41 has a contradictory requirement: to allow element based languages as queries.

So if we want to have a textual format to avoid horrid XML tags, wouldn't it also needs to cope with not only how to convert Schematron's XML to non-tags but also how to convert XML-tagged @context, @text, @value-of into non-tags, which could only happen by replacing tags with some other delimiter system (i.e. JSON).

tgraham-antenna commented 2 years ago

I see #41 has a contradictory requirement: to allow element based languages as queries.

As with @michael-aka-mmh's comment, this and #41 are both proposals to make Schematron easier to work with.

So if we want to have a textual format to avoid horrid XML tags, wouldn't it also needs to cope with not only how to convert Schematron's XML to non-tags but also how to convert XML-tagged @context, @text, @value-of into non-tags, which could only happen by replacing tags with some other delimiter system (i.e. JSON).

So if we want to have a format with a lower barrier to entry, with a lower cognitive load, that's easier for beginners to use, ...

Firstly, #41 and a configured XML editor will probably get you there.

Secondly, I don't know what a textual format would be, so I wouldn't rule it out because of its as yet unknown delimiter system. It might be JSON-like, it might be CSS-like (where <rule> and above are like @ rules and @test is like a CSS selector, or something), or it might be something completely different. It might use { and } to delimit XPaths to be evaluated, same as XSLT's attribute value templates and, now, text value templates. I don't know: this is just a proposal, not a fully-feature solution.

rjelliffe commented 2 years ago

One method would be to start with JSON with the common mapping (element is an array of object, first object is attributes) and then see if there were any regular changes that could be made to simplify it.

"schema" : [

    { "phase": "#ALL",
      "qlb":"xpath2"
      "xmlns":"http://purl.oclc.org/dsdl/schematron"
    },  
    {"pattern": [
       {"id": "p1"},
       {"rule": [
          { "context": "a/b/c",
           "id":"r1"},
           {"assert": [
              { "test":"x/y",
                 "role":"error" },
               "#text":"a/b/c should not contain x/",
               ] 
            }
         ]
       }
      ]
    }
  ]

Then you could,say, strip out the quotes on the names, use the quote delimiter as short hand for #"text:", remove all {}, and replace the syntax of the attributes (i.e. the first in each array) with @ delimiter. The commas are not needed either, due to the simpler set of tokens. Lets replace the : with = just because we can, and remove it from named arrays. And replace the [] with {}, again for no good reason.

So this gives a short-hand syntax convertable to JSON convertable to XML (and back)

schema  {    
    @phase= "#ALL"
    @qlb="xpath2"
    @xmlns="http://purl.oclc.org/dsdl/schematron"
    pattern {
       @id="p1"
       rule {
          @context="a/b/c"
           @id="r1"
           assert {
              @test="x/y"
              @role="error" 
              "a/b/c should not contain x/"
               }
         }
    }
}
tgraham-antenna commented 2 years ago

So it's not infeasible, but I think it's still an open question whether we can come up with something that enough people would want to write by hand (especially if there is #41 and a decent XML editor).

Half of me wants to work more on this, and half of me wants to point out that at this point we'd be better served by getting more enhancement proposals to the point of being provably not infeasible (or, if necessary, provably infeasible).


rjelliffe commented 2 years ago

The well-known attributes don't really need @

Sure. I was aiming at a general syntax that could be converted (to JSON) based only on delimiters, because that doesn't break on, e.g. upgrades or foreign elements. (And @ might help with questions and communication between XML people and these alleged short-syntax people.)

What about mixed-content messages?

AFIK, the JSON treatment of mixed content is:

 XML:     abc<name/>def          
 JSON   [ "#text":"abc", "name":"", "#text":"def"]

Using my rules above, this is

      "abc"   name="" "def" 

The XML has 3 characters overhead; the JSON has 28 characters overhead; my simplified one has 9 chars overhead. So maybe it is worthwhile in the short form to just drop into an XML-ish syntax for mixed content :

      " abc<name/>def"

That has the advantage of, if you are converting to XML element syntax, you just plonk the string in (after delimiter adjustment.) Simpler than parsing.

Some subset of markdown might be possible, except markdown doesn't support the attributes (nor any the RTL tagging?) But lists, p, title, i, b, code could be adopted, I guess.