fsprojects / FSharp.Data

F# Data: Library for Data Access
https://fsprojects.github.io/FSharp.Data
Other
816 stars 287 forks source link

Support XSD in XmlProvider #57

Closed ovatsus closed 3 years ago

rojepp commented 11 years ago

+1 This seems to be a more robust option than guessing types from a sample.

JonnyBoats commented 11 years ago

+1 I could really use this.

forki commented 11 years ago

Creating the types is easy once you inferred the schema. The problem here is we need something that can understand XSD files - and works on mono.

I think xsd.exe (http://msdn.microsoft.com/en-us/library/x6c1kb0s(v=vs.80).aspx) is probably not the way to go.

panesofglass commented 11 years ago

:+1: Thanks, @tpetricek. I looked for this but couldn't find it.

tpetricek commented 11 years ago

It would be great to have this feature - I think this is something that can be done without messing too much with the details of the XML type provider, so it is a perfect project for a new contributor :-). I'm happy to help anyone who is interested in looking into this (@panesofglass ;-)).

The easiest way to support this would be to parse the XSD file and build a value of InferedType (see here) that represents the document. I think this should be fairly straightforward (but of course, XSD is a long W3C standard, so...)

The XML provider does not support all InferedType values that you may construct - you can explore how types of various XML documents look using the following script (it just calls the inference and prints the inferred type nicely):

#r "System.Xml.Linq.dll"
#r @"C:\Tomas\Projects\FSharp.Data\bin\FSharp.Data.DesignTime.dll"
open System.Xml.Linq
open ProviderImplementation
open FSharp.Data.RuntimeImplementation.StructuralTypes

fsi.AddPrinter(fun (t:System.Type) -> t.Name)

let doc = XDocument.Parse("""<root>
    <attrs id="1" />
    <text>hello</text>
    <tricky>3.14</tricky>
    <tricky>true</tricky>
  </root>""")
let culture = System.Globalization.CultureInfo.InvariantCulture

XmlInference.inferType culture false doc.Root

For the example, you get:

Record                  // An XML element is always represented as Record
  (Some "root",         // This is the name of the record
    [{Name = "";        // A list of attributes/children - the name "" is
      Optional = false; //    special and means the body of the element
      Type =            //    (optional means it may/may not be there)
      Collection        // Collection represents multiple children 
        (map
            [(Record (Some "attrs"), // This is InferedTypeTag - it should be              
              (Single,               // basically the name of the element
              Record (Some "attrs",[{Name = "id";  // Attribute 'id' of type 'int'
                                      Optional = false; // .. that's not optional
                                      Type = Primitive (Int32,null);}])));
            (Record (Some "text"),
              (Single,               // Single vs. Multiple - how many times 
                                     // can the child element appear?
              Record (Some "text",[{Name = "";  // Body is non-optional
                                    Optional = false; // .. string
                                    Type = Primitive (String,null);}])));
            (Record (Some "tricky"),
              (Multiple,
              Record
                (Some "tricky",
                  [{Name = "";
                    Optional = false;
                    Type =         // This represents type that is a choice
                    Heterogeneous  // between multiple possible options - 
                      (map         // here number of boolean
                          [(Number, Primitive (Decimal,null));
                          (Boolean, Primitive (Boolean,null))]);}])))]);}])```
panesofglass commented 11 years ago

This may be a good opportunity to finally learn how type providers work. I'm interested. That said, you can look around and see just how quick I am to finish things. :)

Are you thinking a custom XSD parser? What are the limitations of using other, external libraries? I forget.

tpetricek commented 11 years ago

If there are some external libraries that we can easily rely on, that's probably fine - I'm not sure how difficult would it be to just read XSD as a XML document and work with that...

I'll be certainly happy to help (and maybe I'll recruit more people :-)) but I'm probably not able to lead the effort.

rojepp commented 11 years ago

I'm not sure reading Xsd's as plain Xml documents is a good way forward. Xsd's are complicated beasts. I started experimenting with this using classes in System.Xml.Schema but I never really got anywhere. Lack of time/ran out of energy.. You should probably get some tricky Xsd's to try out with the TP to understand why it's not trivial. Often, you'll get some complex Xsd from a third party and it would be awesome to have a robust Xsd TP for them.

chaliy commented 11 years ago

.NET already have build in support for parsing XSD - XmlSchema. Main problem is to convert XmlSchema model into appropriate for typeprovider model. So for example logic that says that "xsd:element with xsd:sequence of xsd:element should result as DTO" is hardcoded in xsd.exe tool. This logic is internal and so need to be reimplemented. XmlSchemaElement already have some information like IsAbstract that could help.

tpetricek commented 11 years ago

Cool - I did not know about System.Xml.Schema - that certainly sounds like a better way to go.

dfaroi commented 11 years ago

Maybe following codeplex project could help : Open Linq To XSD => http://openlinqtoxsd.codeplex.com forked of Linq To XSD => http://linqtoxsd.codeplex.com

ovatsus commented 10 years ago

@runefs, seems you're working on a XSD type provider (http://stackoverflow.com/questions/20466880/getting-compile-error-on-provided-type). Would you like to integrate it in FSharp.Data?

runefs commented 10 years ago

I might be able to contribute. i'm currently working on a XSD type provider. It's currently based on System.Xml.Linq but 'd look into changing that to System.Xml.Schema

panesofglass commented 10 years ago

Transformations would be so nice. I mentioned XQuery to @mausch in the issues for his XmlLiteralsTypeProvider. XSLT would also be interesting, but I though XQuery might be a little simpler to implement in F#.

pver commented 10 years ago

I'm quite new to type providers (so maybe I'm totally wrong with my question) but could such an XSD provider be used instead of providing a sample XML to the XML type provider? So could the XSD type provider be used to get the types from an XSD file, and could that type information be used (in a second step) as an input to the XML type provider to parse an XML file? Or will it just expose the types defined in the XSD and nothing more?

ovatsus commented 10 years ago

@pver that's the idea, instead of giving an xml example, provide a xsd that defines the structure of the xml, but still generate types for the xml described by the xsd, not for the xsd itself

pver commented 10 years ago

@ovatsus thanks for the confirmation, that sounds great :)

ovatsus commented 10 years ago

@runefs, @rojepp, @pezipink, all of you seemed interested in contributing this, has any of you made any progress on this? There's a lot of people asking for this, it would be great to be able to pass a xsd to the schema parameter of XmlProvider and make the types match correctly

ghost commented 10 years ago

I've had lots of people ask me too

runefs commented 10 years ago

I made progress and then did nothing about it. Being Willy and not satisfied because the testing wasn't exhaustive. I've been able to parse the schemes I had laying around. I'll try and merge what ever changes there's been since I work on it and then create a pull request with what I have if that's of interest

Mvh Rune

Den 19/03/2014 kl. 19.42 skrev Gustavo Guerra notifications@github.com:

@runefs https://github.com/runefs, @rojepp https://github.com/rojeppyou both seemed interested in contributing this, has any of you made any progress on this? There's a lot of people asking for this, it would be great to be able to pass a xsd to the schema parameter of XmlProvider and make the types match correctly

Reply to this email directly or view it on GitHubhttps://github.com/fsharp/FSharp.Data/issues/57#issuecomment-38083347 .

runefs commented 10 years ago

Ok ok I'll get to it :) if nothing else I have a very working prototype I think

Mvh Rune

Den 19/03/2014 kl. 20.53 skrev Don Syme notifications@github.com:

I've had lots of people ask me too

Reply to this email directly or view it on GitHubhttps://github.com/fsharp/FSharp.Data/issues/57#issuecomment-38091828 .

ovatsus commented 10 years ago

Great! Let me know if you need help merging, there's been quite a few internal changes to FSharp.Data in the meantime. Was the existing InferedType data model enough or did you have to extend it?

JonnyBoats commented 10 years ago

If you need help with testing I have some rather complicated data with schema (XSD files) that I could throw at it.

runefs commented 10 years ago

I'm gonna need some help on this one I have a few simple question (in the newb category) What's the process for testing. I essentially could figure our how to add tests for the provider. (Which has been a drag when debugging too!)

And John please send me the XSDs (preferably with sample XML as well)

Br Rune

2014-03-19 22:15 GMT+01:00 John Tarbox notifications@github.com:

If you need help with testing I have some rather complicated data with schema (XSD files) that I could throw at it.

Reply to this email directly or view it on GitHubhttps://github.com/fsharp/FSharp.Data/issues/57#issuecomment-38107800 .

ovatsus commented 10 years ago

There are several kinds of tests:

A bit of this is on http://fsharp.github.io/FSharp.Data/contributing.html, but not much. It would be great if you could improve it based on your experience. Let me know if you need more help. You can also push to your repo and I can have a look if you want

pezipink commented 10 years ago

Just to add to this - I don't think it matters if your XSD provider does not work perfectly (or at all) with John's complex schema(s). What's important is a base we can work from that covers the very fundamental blocks of XSD, and we can then work on it together from there.

tpetricek commented 10 years ago

+1 to what @pezipink says! Even an initial implementation covering some of the easier test cases would be a great move forward!

tfrimor commented 10 years ago

A provider just covering the simple cases might also be enough for most of us. I've been looking for an XSD Type Provider for a large but simple XSD.

forki commented 10 years ago

there is already a proposal for this see https://github.com/fsharp/FSharp.Data/pull/558 - please give it a try

ovatsus commented 10 years ago

The initial work is merged into the XsdProvider branch, but there's still a lot of work to do. We should gather a list and add separate issues for what needs to be done to make it releasable.

It needs more tests and docs, but before that, I wanted to know what people think about having a separate XsdProvider vs having XmlProvider and being able to specify either a xsd or a sample xml as a parameter?

ovatsus commented 10 years ago

@runefs your commits lost your username, but still got your email (example: https://github.com/fsharp/FSharp.Data/commit/99fac6615eb9559d54a3be640e516e9df55038fb). If you add that email to your github account I think it will recognize them as yours again. Not sure how that happened

pezipink commented 10 years ago

My opinion :

I think they should be the same - it is really just a special case of how to infer the types for the XML document. An XsdProvider might give the impression that we are providing types over the XSD rather than the XML it produces.

If i wanted to switch out my existing XML providers to use XSD instead of inferred literals / files, I should just be able to give it an XSD literal or file.

tpetricek commented 10 years ago

I agree with @pezipink. I think the natural way to use it is XmlProvider<Schema="foo.xsd"> just like the natural use for CSV provider is CsvProvider<Schema="Foo,Bar">.

ovatsus commented 10 years ago

Added issues here: https://github.com/fsharp/FSharp.Data/issues?milestone=5&state=open

runefs commented 10 years ago

There is (at least) one difference though between how the XML provider works and XSDs. An XSD does not describe one XML document it describes a set of possible XML documents. Each top level element tag and each top level complexType tag describes a XML Document (and potentially also part of one if the Schema is imported into another XSD). Whether this is enough to keep them seperate I don't kow but as is evident from the code I thought the XSD specific complexity enough to keep them seperate.

On a completely different node. On thing I'd suggest to incoporate for (some of) the InferedStructures is the option of providing documentation. In XSD you can provide documentation for each element and type, it would be neat to be able to forward this to the provided types

2014-04-25 0:23 GMT+02:00 Gustavo Guerra notifications@github.com:

Added issues here: https://github.com/fsharp/FSharp.Data/issues?milestone=5&state=open

— Reply to this email directly or view it on GitHubhttps://github.com/fsharp/FSharp.Data/issues/57#issuecomment-41340024 .

ovatsus commented 10 years ago

Yes, providing the documentation would be great!

runefs commented 10 years ago

OK, i'll kick the can a bit further. Does the XSD even describe an XML document. Doesn't it just describe the structure of data. Keeping them separate you could use the same XSD for data providers where some uses e.g JSON and some uses XML and all we'd need is a static argument to the provider on which we could select the runtime component.

2014-04-27 20:57 GMT+02:00 Gustavo Guerra notifications@github.com:

Yes, providing the documentation would be great!

— Reply to this email directly or view it on GitHubhttps://github.com/fsharp/FSharp.Data/issues/57#issuecomment-41505406 .

nwolverson commented 9 years ago

Having a look at this as I'd quite like to use it myself.

Struggled to get started at all with using XsdProvider, firstly not finding my XSD (file logic is not as robust as XmlProvider, can at least share this if not merging them), then not parsing simple examples (#771).

Will try to get towards a point I can parse my desired XSD, though I find the format/APIs a bit complex sometimes...

runefs commented 9 years ago

Can you share the XSDs that you can't parse?

Mvh Rune

Den 03/01/2015 kl. 19.08 skrev Nicholas Wolverson notifications@github.com:

Having a look at this as I'd quite like to use it myself.

Struggled to get started at all with using XsdProvider, firstly not finding my XSD (file logic is not as robust as XmlProvider, can at least share this if not merging them), then not parsing simple examples (#771).

Will try to get towards a point I can parse my desired XSD, though I find the format/APIs a bit complex sometimes...

— Reply to this email directly or view it on GitHub.

nwolverson commented 9 years ago

Sample here: http://msdn.microsoft.com/en-us/library/dd489283.aspx (#772 and namespace in XSD - seem to have a fix for that https://github.com/nwolverson/FSharp.Data/tree/XsdProvider-targetNamespace - parses with that change + reordering)

http://www.topografix.com/GPX/1/1/gpx.xsd (#772 at design time then namespace in XML at runtime - parses once I reorder and delete the namespace reference).

nwolverson commented 9 years ago

Actually I think much of the problems (2 separate issues) are down to incorrect handling of qualified/unqualified element forms, being applied to the schema instead of the instance document (http://www.w3.org/TR/xmlschema-0/#NS) - all quite hard to understand and overly complicated to handle (document default, specified on element, global vs nested then interaction with default ns...)

Think I have an initial fix here so will whip up a PR with tests soonish.

giacomociti commented 9 years ago

Hi, in my hobby project I dealt withSystem.Xml.Schema. So maybe I can help here (but I'm a GitHub and TP noob). My idea is to define a simplified schema model, much like this but even simpler. Then we can split the task:

nwolverson commented 9 years ago

That doesn't seem like a bad idea, but have you looked at the current version? By my recollection it was mostly there, maybe some issues with external references & could do with checking by someone who really understands XSD.

And I guess before this can ever be merged, some tidying up of the API as discussed above, some decisions to be made.

giacomociti commented 9 years ago

Sure I need first to play a bit with the current version. At a glance I just noticed that using XmlSchemaSet instead of XmlSchema, and calling the Compile() method on it may help with external references. But knowing a few quirks like this does not make me an XSD expert.

runefs commented 9 years ago

FYI: I've started trying to look into this again and for one changing to schemaset. I hope to have that incorporated soonish. I'm running the tests at the moment trying to wead out a few issues. WHen that's done I will rerun with the XSDs mentioned previously in this thread

2015-07-24 22:48 GMT+02:00 giacomociti notifications@github.com:

Sure I need first to play a bit with the current version. At a glance I just noticed that using XmlSchemaSet instead of XmlSchema, and calling the Compile() method on it may help with external references. But knowing a few quirks like this does not make me an XSD expert.

— Reply to this email directly or view it on GitHub https://github.com/fsharp/FSharp.Data/issues/57#issuecomment-124717894.

giacomociti commented 9 years ago

In fact the XsdProvider seems almost there, but still there are a few issues and I was not able to sort them out. In the meantime I also pursued my 'divide and conquer' attempt, mapping xsd to InferedType through an intermediate simplified representation of xsd. The approach to better follow the advice of @tpetricek was: given a schema I fed the xml inference with valid samples and looked at the resulting InferedType. Comparing this with the outcome of my function confirmed that what we obtain is the same as if using the XmlProvider with samples.

runefs commented 9 years ago

Also for recursive types? And for other situations where the same type is used in multiple places?

-Rune

Den 03/08/2015 kl. 01.37 skrev Giacomo Citi notifications@github.com:

In fact the XsdProvider seems almost there, but still there are a few issues and I was not able to sort them out. In the meantime I also pursued my 'divide and conquer' attempt, mapping xsd to InferedType through an intermediate simplified representation of xsd. The approach to better follow the advice of @tpetricek was: given a schema I fed the xml inference with valid samples and looked at the resulting InferedType. Comparing this with the outcome of my function confirmed that what we obtain is the same as if using the XmlProvider with samples.

— Reply to this email directly or view it on GitHub.

giacomociti commented 9 years ago

Not yet. I just focused on providing the type for a single global element definition. I acknowledge it's worth providing types also for complex types in xsd. The problem I see is that, while the runtime representation for an element definition is clearly an XmlElement, for complex types it may be something a little different, yielding an XmlElement only when given an element name.

runefs commented 9 years ago

@gustavo. please ignore the pull request from previous today to this issue. I seem to have misunderstood the build command (or build the wrong project at least) I'll be back when the very obvious errors are corrected. Sorry for the inconvenience :)

2015-08-03 10:05 GMT+02:00 Giacomo Citi notifications@github.com:

Not yet. I just focused on providing the type for a single global element definition. I acknowledge it's worth providing types also for complex types in xsd. The problem I see is that, while the runtime representation for an element definition is clearly an XmlElement, for complex types it may be something a little different, yielding an XmlElement only when given an element name.

— Reply to this email directly or view it on GitHub https://github.com/fsharp/FSharp.Data/issues/57#issuecomment-127157381.

ovatsus commented 9 years ago

no problem