Closed hubgit closed 9 years ago
I spent a couple of hours working on this. @hubgit, I could either push my work up to this branch, or make a new one, if you would prefer.
I learn a lot from you! I hadn't seen the Promise and fetch APIs before -- they are very nice.
What I want to do is the following:
@Klortho Great! I think it would be best to make a new branch, as I've linked to this one from elsewhere. That way you can make as many changes as you want, and we can merge together what works at the end.
Could you provide a few notes about how you created xmllint.js? I see you used emscripten -- was it pretty easy? Did you follow this blog post, by any chance?
I'm thinking, wouldn't it be nice if we could use libxml's parser, instead of the browser's? But, I guess there would probably be a big mismatch between whatever the output of that was and whatever Saxon-CE expects. We still need to parse out the processing instruction, before sending to the schematron step -- but I guess that could be done easily enough with regular expresssions.
I actually wrote a blog post about compiling xml.js, so I'd remember how to do it.
Hi, @hubgit ,
Where is this xmllint.js from? It is not from your fork here, branch dtd-validation, is it? I searched that fork for schemaFiles
, and came up short.
There is a bug when I try to run it without any schema files, for those documents that don't have a doctype decl. There is a line, parts=schemaFile[0].split("/");
, that fails.
I can work around it by passing in a dummy dtd: schemaFiles: [["dummy", ""]]
, but I'd like to know the origin of this xmllint.js, so we can work on it later if needed.
Closing this one in favor of #25. @hubgit , reopen if you disagree.
Where is this xmllint.js from? It is not from your fork here, branch dtd-validation, is it? I searched that fork for schemaFiles, and came up short.
I think it was an earlier version of that fork, which I then must have force-pushed a new version to, with a cleaner (xmllint(args, files)
) interface. I'll see if I can update your branch with the latest version - it shouldn't make any practical difference, other than the way it's called.
[this isn't necessarily ready to merge yet - it has some drawbacks and might need further work]
Added here:
dtd
directory).xmllint.js
(~4MB), which is libxml2's xmllint ported to JS using Emscripten.Promise
andfetch
APIs, for browsers that don't support them.When an XML file is selected, it will first be passed through
xmllint
(equivalent toxmllint --noent --dtdvalid JATS-journalpublishing1.dtd example-file.xml
) which validates the contents of the XML file against the DTD and replaces named entities.Once the XML is validated, it continues on to the Schematron checks as before.
This fixes https://github.com/JATS4R/elements/issues/49 - but has a downside in that it will only validate XML against the JATS 1.0 DTD, and requires that the doctype URI at the start of the XML file is exactly "JATS-journalpublishing1.dtd" - any other form and it will fail to validate.
I haven't yet thought further about how other DTDs could be accomodated, or whether to proceed to Schematron validation if this first step fails.