eclipse / lemminx

XML Language Server
Eclipse Public License 2.0
266 stars 91 forks source link

Support for xml-model processing instruction for binding XML to DTD/XML Schema #633

Closed angelozerr closed 4 years ago

angelozerr commented 4 years ago

The xml-model processing instruction gives the capability to bind a XML with a grammar (XML Schema, DTD and other grammar kind like RelaxNG, RelaxNG compact, Schematron)

The goal of this issue is to support binding between XML and XML Schema / DTD by using xml-model.

<?xml-model href="some-schema.xsd" type="application/xml" schematypens="http://www.w3.org/2001/XMLSchema"?>
<?xml-model href="some-dtd.dtd" type="application/xml-dtd"?>

The binding with xml-model should provides inside the XML file:

BalduinLandolt commented 4 years ago

I'll look into it a bit more. Can't promise anything though.

From what I gather on first glance, the interesting jazz happens here... is that right? If so, where exactly does it get, where the stylesheet is declared?

Also, am I missing something, where I can find a useful documentation of Xerces? I couldn't find anything on Xerces and xml-model processing instruction...

angelozerr commented 4 years ago

From what I gather on first glance, the interesting jazz happens here... is that right? If so, where exactly does it get, where the stylesheet is declared?

No this validator is used for validate XSD Schema not a XML files bound to a XSD validator. The correct class is https://github.com/eclipse/lemminx/blob/master/org.eclipse.lemminx/src/main/java/org/eclipse/lemminx/extensions/contentmodel/participants/diagnostics/XMLValidator.java

Also, am I missing something, where I can find a useful documentation of Xerces? I couldn't find anything on Xerces and xml-model processing instruction...

Indeed, Xerces seems not support it, we need to implement it.

fbricon commented 4 years ago

http://people.apache.org/~andyc/neko/doc/relaxng/usage.html might be helpful

BalduinLandolt commented 4 years ago

So... I've been working on this a bit now. (see my fork here: https://github.com/BalduinLandolt/lemminx/tree/xml-model)

The good news is: I got a concept XMLModel working and if there is a processing instruction <?xml-model ... ?> it gets turned into such an XMLModel object.
This also allows for checking if a xml-model is present, and where it points (this test passes: https://github.com/BalduinLandolt/lemminx/blob/4a33061145952f70bbdb14311716229424af30f8/org.eclipse.lemminx/src/test/java/org/eclipse/lemminx/extensions/processinginstruction/XMLModelTest.java#L24-L37 ).
So far that was pretty easy.

The bad news is: That's about where I got stuck...
The thing I can't seem to figure out is where the XMLValidator actually determins where to find the Schema. And, going hand in hand with that, I don't know how to ensure it actually finds the schema that's in the xml-model.
I even tried setting the noNamespaceSchemaLocation to the xml-model schema, but not even that helped... I just keep getting "Cannot find the declaration of element ..."

If you have any tipps for me, I'll gladly continue trying. Otherwise I failed. ^^

angelozerr commented 4 years ago

So... I've been working on this a bit now. (see my fork here: https://github.com/BalduinLandolt/lemminx/tree/xml-model)

Wow great, could you create a draft PR please. It will be more easy to test and review your PR and give you some feedback. Thanks!

BalduinLandolt commented 4 years ago

Aye.

Stupid question (it's my first project of that sort...): When I fork and work on a feature branch, should I keep it up to date with the head of the main repo (i.e. pull everything new into my fork and into the branch), or is it best to keep it unaffected of what has happened since the forking?

angelozerr commented 4 years ago

Stupid question (it's my first project of that sort...):

Question are never stupid :)

When I fork and work on a feature branch, should I keep it up to date with the head of the main repo (i.e. pull everything new into my fork and into the branch), or is it best to keep it unaffected of what has happened since the forking?

At first you can create a PR when you want 'a (draft PR or real PR) by using Github, it's easy. If you have none conflict with the LemMinx master, we will able to merge your PR with rebase, so you don't need to do something. But if there are conflict, you will have to do a rebase on your side.

In my case when I work on branch and I see a new PR which has been merged in the master, I rebase my branch to try to keep in sync with master.

But for the moment don't losse your time with that, please create a draft PR.

The thing I can't seem to figure out is where the XMLValidator actually determins where to find the Schema.

To give you some explanation when you type something in the XML editor, there are 2 parses on server side:

You work is about DOM document, so I think the completion based on XML Schema should work, no?

I have seen that you try to fill external schema location to manage validation, but I wonder if it's a good idea since xml-model seems to give the capability to declare several XML Schema.

More external DTD doens't work with Xerces. I think your work is very good for completion based on XML, for hyperlink for href, etc but nor for validation.

I have tried to start something (I will push my work ASAP) and the idea is to provide a Xerces XMLModelHandler (like Xerces XIncludeHandler). I will create a branch for that ASAP to see what I mean.

BalduinLandolt commented 4 years ago

Question are never stupid :)

That's aguable. ;)

When I fork and work on a feature branch, should I keep it up to date with the head of the main repo (i.e. pull everything new into my fork and into the branch), or is it best to keep it unaffected of what has happened since the forking?

At first you can create a PR when you want 'a (draft PR or real PR) by using Github, it's easy. If you have none conflict with the LemMinx master, we will able to merge your PR with rebase, so you don't need to do something. But if there are conflict, you will have to do a rebase on your side.

In my case when I work on branch and I see a new PR which has been merged in the master, I rebase my branch to try to keep in sync with master.

But for the moment don't losse your time with that, please create a draft PR.

Ok, done.

The thing I can't seem to figure out is where the XMLValidator actually determins where to find the Schema.

To give you some explanation when you type something in the XML editor, there are 2 parses on server side:

* a parse which uses SAX with Xerces to manage validation.

* a parse to build a DOM document which is fault tolerant, in otherwise even if there are some error (ex : element is not closed) , the DOM document is build. This DOM document is used for all features like outline, format, completion based on XML Schema / DTD, etc

You work is about DOM document, so I think the completion based on XML Schema should work, no?

I haven't tested completion yet. Validation doesn't work, though.

I have seen that you try to fill external schema location to manage validation, but I wonder if it's a good idea since xml-model seems to give the capability to declare several XML Schema.

You're absolutely right.
I only did this to try to get the xml-model schema somewhere that's already implemented, to see if that works.

I'm just wondering if it wouldn't be more elegant to bundle the entige schema stuff a bit more. At the moment it's already a bit scattered, with schemaLocation, noNamespaceSchemaLocation and externalSchemaLocation; all of them seeming to behave slightly different, being detected in different places, etc. And even more so, if I add xmlModelSchemaLocation.
I don't think I understand everything well enough to make a qualified suggestion. But I think it might be more neat to have all schema checked for, stored and retrieved at the same place and in the same way. How do you feel about this?

And in the same way, I'm not sure it's ideal to handle <?xml-model...?> so idiosyncratically. Shouldn't there be some routine that handles all processing instructions alike. And for now, only really does something with xml-model?

More external DTD doens't work with Xerces. I think your work is very good for completion based on XML, for hyperlink for href, etc but nor for validation.

I have tried to start something (I will push my work ASAP) and the idea is to provide a Xerces XMLModelHandler (like Xerces XIncludeHandler). I will create a branch for that ASAP to see what I mean.

Interesting, I'm looking forward to it. But no hurry! I don't have much time at the moment, so I probably wont be working on this much in the next 3 weeks or so...

BalduinLandolt commented 4 years ago

Another thing I realized I was unsure about is, how schema that are declared with xml-model relate to namespaces. Do I even need to worry about this? Or is that basically left up to the schema?
Might be best to clear this up before I implement something nonsensical...

angelozerr commented 4 years ago

probably wont be working on this much in the next 3 weeks or so...

Ok perhaps I will work on my side. My first step is to try to support validation at first. For other features like completion, we will see after (we will need your work).

Another thing I realized I was unsure about is, how schema that are declared with xml-model relate to namespaces.

Indeed it was my question and specification doesn't speak about that. if you know it works, any feedback are welcome!

BalduinLandolt commented 4 years ago

probably wont be working on this much in the next 3 weeks or so...

Ok perhaps I will work on my side. My first step is to try to support validation at first. For other features like completion, we will see after (we will need your work).

Allright.
If you want me to look over things, let me know. I can definitely find a couple of minutes here and there.

Another thing I realized I was unsure about is, how schema that are declared with xml-model relate to namespaces.

Indeed it was my question and specification doesn't speak about that. if you know it works, any feedback are welcome!

From my expirience, the schema seems to handle the namespace. See e.g. the schemas here: https://www.tei-c.org/release/xml/tei/custom/schema/xsd/
(I looked at tei_all.xsd)
it seems it's the <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.tei-c.org/ns/1.0" ...> root element, that does the trick: if the xml uses the same namespace for an element (i.e. here http://www.tei-c.org/ns/1.0), then it will validate.

angelozerr commented 4 years ago

Good news i have found a solution to validate XML by xsd or dtd with xml-model by developping a xerces component!

Let me clean a little m'y code and i will push m'y branch in order to you play with it.

The only thing that i have not manages is xsd with namespaces. If i understand your feedback it seems WE need to use targetnamespace from xsd .is that?

BalduinLandolt commented 4 years ago

Sounds good!

That's what I suspect, but I'm not sure.
Maybe that won't even be such a big problem? I mean, generally, namespaces are in a sense nothing but element prefixes. Therefor, if the XML has <tei:body> and the schema just has <body>, that won't match. But if the schema has targetnamespace=tei, then <body> will actually mean <tei:body> and so it will match. Maybe I'm naive, but that sounds somewhat trivial to me; maybe we don't even have to do much.

But would it be an option to not worry about namespaces just now, and see if things work without namespaces? And then we can do namespaces, once we know the general system works.
Once i have more time, I can see if I find more information on that too.

angelozerr commented 4 years ago

@BalduinLandolt could you play with my draft PR https://github.com/eclipse/lemminx/pull/688/files and give me feedback.

This PR should manage (only) validation for XML based on DTD/XML Schema by using xml-model. This PR doesn't support namespace in XML Schema (we should clarify that perhaps in an another PR).

Please notice my code is very ugly, I need to clean it, it's just a POC that I wrote quickly. Thanks for your feedback.

angelozerr commented 4 years ago

I split this issue in 3 issues:

@BalduinLandolt please see https://github.com/eclipse/lemminx/issues/697 for validation and I think you work is for completion based on grammar (https://github.com/eclipse/lemminx/issues/698)

angelozerr commented 4 years ago

All issues which implements xml-model are finished.