djc / rnc2rng

RELAX NG Compact to regular syntax conversion library
MIT License
16 stars 13 forks source link

Documentation wanted #43

Open ketil-malde opened 10 months ago

ketil-malde commented 10 months ago

I'm using lxml to validate XML documents against a RelaxNG schema. Since I prefer readable, I use the compact type, i.e. rnc files, and convert them with trang. This works, but I would like to get rid of the Java dependency. It looks like I could use this as a library in my code to read the RNC schema, convert it into an XML string, and hand it to the lxml validator? Currently, I have:

    with open(my_schema.rng', 'r') as sf:
        xmls = etree.parse(sf)
        schema = etree.RelaxNG(xmls)
    with open('my_data_file.xml', 'r') as f:
        doc = etree.parse(f)
    if not schema.validate(doc):
         # do error processing

I think it should be possible to build xmls - i.e. the RNG schema in XML format directly from this library (rnc2rng), but the documentation only describes its use for file processing. I think the process I outline here is useful enough that it should be documented.

(If the current maintainers are unwilling to do it, would you accept a pull request if I figure it out on my own?)

djc commented 10 months ago

Definitely happy to accept a PR. In generally, I'm not able to spend much time on this project at this stage (https://github.com/djc/rnc2rng/issues/41) as my interests have shifted, but can probably review a documentation PR.

ketil-malde commented 10 months ago

Figured it out, it's just a matter of using rnc2rng's functions load (to read the schema file) and dumps (to write the schema in XML format to a string). The encode is necessary to avoid conflicts between the string encoding and the encoding specified in the XML output.

    rng = rnc2rng.load('my_schema.rnc')
    rngxml = rnc2rng.dumps(rng).encode()
    schema = etree.RelaxNG(etree.fromstring(rngxml))
    doc = etree.parse('my_data_file.xml')
    if not schema.validate(doc):
         # etc

Note, etree.parse can take a filename directly, no need for manually opening the file.

djc commented 10 months ago

At some point I put built-in support for rnc2rng into lxml, by the way, so I think you can call etree.RelaxNG('my_schema.rnc') directly as long as rnc2rng is installed.

https://github.com/lxml/lxml/blob/c3a92ba58b77eae7e32a0a86ad1d191664f11f87/src/lxml/relaxng.pxi#L60