SmartTokenLabs / TokenScript

TokenScript schema, specs and paper
http://tokenscript.org
MIT License
241 stars 71 forks source link

xsd 1.1 assert not validated by xerces 2.12.1-xml-schema-1.1 #395

Open SmartLayer opened 4 years ago

SmartLayer commented 4 years ago

You will not be able to reproduce this because I commended out the offending line in tokenscript.xsd

  1. Download xerces 2.12.1-xml-schema-1.1
    $ wget -O - https://archive.apache.org/dist/xerces/j/binaries/Xerces-J-bin.2.12.1-xml-schema-1.1.tar.gz|tar -zxvf -
  2. Run the validator
    $ java -classpath xerces-2_12_1-xml-schema-1.1/xercesImpl.jar:xerces-2_12_1-xml-schema-1.1/xercesSamples.jar:xerces-2_12_1-xml-schema-1.1/xml-apis.jar sax.Counter -s COFI.xml
    [Error] tokenscript.xsd:55:78: s4s-elt-invalid-content.1: The content of '#AnonType_token' is invalid. Element 'assert' is invalid, misplaced, or occurs too often.
    COFI.xml: 1616 ms (52 elems, 53 attrs, 0 spaces, 22526 chars)

To reproduce this problem, uncomment the two lines mentioned in #388 and edit the test xml file (in this case COFI but any tokenscript file will do) to use the edited tokenscript.xsd then you can see this problem.

Note that I am already using the version of xerces that supports xml-schema 1.1

darakhbharat commented 4 years ago

Hi Weiwu,

I have created xerces based utility to validate the XML using the XSD file and below are the details.

Command: $ java -classpath "xercesImpl.jar;xercesSamples.jar;xml-apis.jar;xpath2-1.2.0.jar;XMLValidator.jar" XMLValidator H:/alphawallet/TokenScript/schema/tokenscript.xsd H:/alphawallet/tokenscripts/COFI.xml

Note: You need to replace ; with : (for unix) while adding JAR files in classpath.

All the required JAR files are attached here. xerces-2_12_1-xml-schema-1.1.zip

Arguments:

  1. First argument is the absolute path to xsd file. i.e. H:/alphawallet/TokenScript/schema/tokenscript.xsd
  2. Second argument is the absolute path to XML file that needs to be validated against the XSD mentioned in argument 1 i.e. H:/alphawallet/tokenscripts/COFI.xml

Now you can validate XML against the XSD 1.1 using this package.

darakhbharat commented 4 years ago

Tracking of requirement details from mail conversation:

It seems that XML Schema 1.1 is only supported by either Xerces 2.12 (the version with XML schema 1.1 support) or with Saxon. Saxon's opensource version, at a glance, only support XSLT and XQuery, since there is no mentioning of validation in the manual.

Once you have the validator, we will need a Pull-Request that not only returns the xmlschema 1.1 rules that I commented out (2 lines), but also change the schema's root element according to this article:

https://www.oxygenxml.com/doc/versions/22.1/ug-editor/topics/set-xml-schema-version.html

Otherwise, some tools will still process it with schema 1.0.

Let me know how you progressed on this! Thanks.

darakhbharat commented 4 years ago

https://github.com/AlphaWallet/TokenScript/issues/395#issuecomment-716395559

Whatever I have here is initial version, we can eventually convert this to your suggested approach in the requirement document.

I can improve the Java Code to take default XSD path from the github and then we just need to pass the xml name to the command. We can also give the option to refer the local XSD schema for the validation.

My idea is to write a shell script where we can pass the action(validate, sign, c14n, verify) as command line argument and then based on action appropriate Java class will be invoked.

SmartLayer commented 4 years ago

My idea is to write a shell script where we can pass the action(validate, sign, c14n, verify) as command line argument and then based on action appropriate Java class will be invoked.

If you do so you will have to produce 2 versions (.sh and .bat) and they may behave a bit different depending on MacOS/Ubuntu. It's no harm if the content is extremely simple, you just need to keep it minimal and test it on all OSes, but in this case it's expected to be complicated - i.e. the fact that you can concatenate sub-commands means it's not going to be simple at all, and whatever shell script you write will have to manage a lot of intermediary files. See the example of "Multi-command processing" below:


Let's say xmlsec.jar for now, has 4 sub commands.

$ java -jar xmlsec.jar val tokenscript.xml
$ java -jar xmlsec.jar  sign [-o tokenscript-signed.xml | -d output.dir/] tokenscript.xml
$ java -jar xmlsec.jar c14n [-o tokenscript-signed.xml | -d output.dir/] tokenscript.xml
$ java -jar xmlsec.jar verify tokenscript.xml

The first and last commands also have a long form (validate and canonic, respectively). The second and the third command has an output. If unspecified, it will simply be tokenscript-signed.xml (that is, take the input file name, remove the extension and add -signed.xml, following the convention set by Android apk files).

Each sub-commands has their own parameters

For example, sign has --key

Multi file processing

It should be possible to process multiple files in all of the commands. For example:

$ java -jar xmlsec.jar val */*.xml

Which validates every XML files under every directory.

For the commands that has an output, either -o or -d should be used. But if there are multiple input file, then only -d is allowed. -d causes the output of the same filename under the directory specified.

Multi-command processing

It should be possible to concatenate commands. The most typical use-cases are:

$ java -jar xmlsec.jar val c14n sign verify tokenscript1.xml tokenscript.2xml

This causes the tokenscript files to be validated, canonicalized, signed and verified, and outputs tokenscript1-signed.xml and tokenscript2-signed.xml. (the verify subcommand is smart enough to know that the output file should be used to verify not the original input file). If one of the sub-command fails, the next sub-command is not executed; but if an input file caused one of the sub-command to fail, the next file in queue is processed.


If you simply don't like the java --jar syntax, then it's a different matter.

SmartLayer commented 4 years ago

This errror seem to be in the schema. Can you make a PR and link back to this issue?

$ LANG=en_US java -classpath XMLValidator.jar:xpath2-1.2.0.jar:xercesImpl.jar:xercesSamples.jar:xml-apis.jar XMLValidator schema/tokenscript.xsd ../token-api-poc/tokenscripts/COFI.xml
COFI.xml is not valid because 
cvc-identity-constraint.4.3: Key 'typeRef' with value 'Transfer' not found for identity constraint of element 'token'.
darakhbharat commented 4 years ago

This was my next finding and actually I am not getting the ERROR that you reported either with XERCES or with oxygen editor but getting the error that you just reported.

image

Schema is expecting below XML block in the XML file. Do you mean that I should fix the schema and make type attribute optional?

More details: I will have to make the type attribute optional from the below XML snippet, Right now it is compulsory, But the given XML does not have any referenced XML block for *type="Transfer" * ```` .... ```` On Mon, Oct 26, 2020 at 4:54 PM Weiwu Zhang wrote: > This errror seem to be in the schema. Can you make a PR and link back to > this issue? > > $ LANG=en_US java -classpath XMLValidator.jar:xpath2-1.2.0.jar:xercesImpl.jar:xercesSamples.jar:xml-apis.jar XMLValidator schema/tokenscript.xsd ../token-api-poc/tokenscripts/COFI.xml > COFI.xml is not valid because > cvc-identity-constraint.4.3: Key 'typeRef' with value 'Transfer' not found for identity constraint of element 'token'. > > — > You are receiving this because you were assigned. > Reply to this email directly, view it on GitHub > , > or unsubscribe > > . >
SmartLayer commented 4 years ago

Schema is expecting below XML block in the XML file. Do you mean that I should fix the schema and make type attribute optional?

Then the validator is working correctly except the error reported isn't human readable!

$ LANG=en_US java -classpath XMLValidator.jar:xpath2-1.2.0.jar:xercesImpl.jar:xercesSamples.jar:xml-apis.jar XMLValidator schema/tokenscript.xsd ../token-api-poc/tokenscripts/COFI.xml
COFI.xml is valid.
SmartLayer commented 4 years ago

Keeping it open when there is a tool so the xsd 1.1 stuff can be uncommented as the documents on how to validate it gets updated.

The approach I would take is:

  1. Fork xmlsectool 3.0.0† with git clone https://git.shibboleth.net/git/xmlsectool
  2. Add two sub-commands: val and c14n support‡
  3. Change commandline syntax from --sign and --verify to just sign and verify
  4. Add multi-file processing
  5. Add multi-command processing

† 3.0.0 is an in-development version expected to come out in 2021 but 2.0.0 the current stable has very old libraries and has bugs with some of our processes. As a result of this approach, the code should be written with Java 11 as it is the default platform of xmlsectool Please try to use the latest Java API as backward compatibility is not desired.

‡ The current xmlsectool supports validation already, but it is not using Xerces with Schema 1.1 support (verified). Xerces seem to be the only one that can validate files that has entity references, which we need.

It is desirable to keep the possibility to sync up with future releases of xmlsectool, so you might choose to add instead of replace (e.g. add a subcommand to validation with Xerces instead of replacing what was there), and use sub-classing instead of changing much of the source code.

darakhbharat commented 4 years ago

Further communication Updates from Telegram:

Weiwu: Stay connected you need to prioritise making the commandline tool that supports only validate (using the schema location in the xml header only - i have a reason for that) and canonicalisation, and support multi file processing and multi command processing. You should not proritise xml signing and verification as I can get by with sectool for the next a few weeks.

Why we need cannibalization? just want to know little bit more details about cannibalization in our existing stuff.

we actually don't need that, just entity dereference. So anything that can correctly read a XML file with entity reference in it and is able to serialise it into a single XML file will do the job for now.

darakhbharat commented 4 years ago

xmlsectool vs Core Java xerces based validator:

Looks like the main focus of the xmlsectool is signing of the XML document. I also do not find the xmlsectool documentation clear. There is very little information available. If you have found different detailed official documentation than mentioned below Please direct me there.

https://wiki.shibboleth.net/confluence/display/XSTJ2/xmlsectool+V2+Home

https://wiki.shibboleth.net/confluence/display/CONCEPT/MetadataCorrectness#MetadataCorrectness-SchemaValidation.5

I do not see any special advantage of using xmlsectool for schema validation and entity de-referencing, So I am in favour of writing our own simple tool using xerces JAR.

darakhbharat commented 4 years ago

Hi Weiwu,

I have completed the multi-file validation and attached is the Java Code. Can you create separate repository where I can commit the code. If I have created my private repository but can not add collaborator as I do not have enterprise git subscription.

XMLValidator.zip

I will start on entity de-referencing. Where we will store the de-referenced file? OR do we need to override the same XML? For now I can create the new XML file to save the result of de-referenced action.

darakhbharat commented 4 years ago

Hi Weiwu,

I am committing my changes in forker repository - https://github.com/darakhbharat/TokenScript.git. Created new directory named xml-validation-against-xsd-1.1 to commit the changes.

Overview:

Here is the command:

$ java -classpath "xercesImpl.jar;xercesSamples.jar;xml-apis.jar;xpath2-1.2.0.jar;XMLValidator.jar" XMLValidator -val -deref H:/alphawallet/TokenScript/schema/tokenscript.xsd H:/alphawallet/tokenscripts/COFI.xml

Things needed to be improved: