Closed sydb closed 2 years ago
This is my experience too, with version 1.7.8 and version 1.8.0 (I'm not sure how I got that one). I don't think rnv itself has changed, so I'm at a loss. Is it possible that something in make itself has changed, causing rnv to be called with less memory available to it?
I don’t think make
has anything to do with it, as I tested against a tiny RNC and XML file outside of the Make environment (just directly on the commandline).
Presumably it's the size of the schema rather than the file being validated that's the problem. It may be time to start throwing some stuff overboard.
I don’t think so, @hcayless. I got the error message when validating
<?xml version="1.0" encoding="UTF-8"?>
<test>13.2</test>
against
datatypes xs = "http://www.w3.org/2001/XMLSchema-datatypes"
element test { xs:double }
The two files total 145 bytes. (Or characters, not sure which.)
I do wonder if a recent update to expat
2.4.4 has anything to do with it. But as far as I can tell, the newest of the dozens of copies of expat on my system is 2.2.9. Still investigating (and not making much progress).
Looks like this is a recently introduced but now known bug. (See below.)
I do not want to just ditch the rnv
validation, though. I like the redundancy. So my suggestion is to disable rnv
validation in the build process by commenting it out for now, and testing it from time to time. (E.g., every time we update p5subset.) This is a devastating bug, so will probably be fixed soon.
@sydb rnv turns out to also use colon (':'
) for a separator:
# git grep ParserCreate | fgrep -v /Expat/
arx.c: expat=XML_ParserCreateNS(NULL,':');
tools/rvp.py:parser=xml.parsers.expat.ParserCreate('UTF-8',':') # last colon in the name
xcl.c: expat=XML_ParserCreateNS(NULL,':');
What is your source for rnv, how do you install it, which version(, where in the CI)?
The most recent version I can find that will build is dtolpin's fork of @hartwork's version:
https://github.com/dtolpin/RNV
I can't find a way to build the original on my system, and there are no instructions for doing that. On the dtolpin repo I can use the Makefile.gnu. @hartwork Will your expat 2.4.6 update roll out to Ubuntu automatically, or do we need to install it manually?
@hartwork —
I am not 100% sure, but most likely I downloaded the rnv-1.7.8.zip file from the davidashen FTP site, and then ran make -f Makefile.gnu
.
(I do not understand what colons have to do with it, but probably don’t need to. :-)
The most recent version I can find that will build is dtolpin's fork of @hartwork's version:
@martindholmes it's actually the other way around: He's the original author, and I helped with some build system fixes, and my "fork" is more recent but also ancient, 2015. His repo points to mine in the description. (Another fork at https://github.com/starseeker/librnv is more active but it's also affected by the colon issue, bundles vulnerable Expat 2.4.3, does not allow reporting issues, and has no e-mail address for contact.)
Potentially the best fix is to auto-convert the RNC syntax to (more widely supported) XML-RNG syntax and then use some other tool to validate against XML-RNG, e.g. xmllint of libxml2 or Trang, see https://relaxng.org/ for more. What do you think?
@hartwork Will your expat 2.4.6 update roll out to Ubuntu automatically, or do we need to install it manually?
That depends on your setup. In any case, 2.4.6 will not help with RNV, because the issue is in RNV.
(I do not understand what colons have to do with it, but probably don’t need to. :-)
Okay :man_shrugging:
@hartwork Does this mean that RNV can't actually be fixed to work with the updated libexpat? We do generate RNGs and do validation with other tools, but RNV has been a handy additional checking tool in our test suite. It seems a shame to lose it for the sake of a colon. :-)
@hartwork Does this mean that RNV can't actually be fixed to work with the updated libexpat? We do generate RNGs and do validation with other tools, but RNV has been a handy additional checking tool in our test suite. It seems a shame to lose it for the sake of a colon. :-)
@martindholmes it can likely be fixed, but the question is in which copy of RNV to adjust and use, what other places in RNV need adjustment to not break it when changing the separator, and if you want to use unmaintained or maintained software. I can fix this issue and add CI to my https://github.com/hartwork/rnv but it won't make it maintained.
So RNV itself will need to be modified? (I had kinda hoped that when Expat is fixed, RNV would magically work again. It (RNV 1.7.8) clearly uses whatever Expat is on the system, as an experiment I tried earlier today demonstrates:
rnv
on a Debian-based system — it worked finernv
command again (i.e., using same schema and XML instance) — out of memory errorSo RNV itself will need to be modified?
@sydb yes.
Sigh.
I would like to share the following news:
Have a nice weekend and a good start to the week.
Thank you @hartwork, this is very much appreciated! @sydb, as soon as I get a chance I'll build the https://github.com/hartwork/rnv/tree/namespace-separator branch and run our tests with it.
Thank you very much @hartwork!
Perhaps as early as tonight I can try to figure out how to build RNV from source your way and test. (I tried by just copying over the Makefile.gnu from 1.7.8, and that worked well enough to test that rnv
on this new branch worked on my tiny test file. Which, embarrassingly, turned out to be invalid.)
@martindholmes — Good idea. As mentioned above a I kludged a local rnv
, and it worked on one tiny test file. I have to leave the computer for an hour or three now, but will see if I can set my machine to run the TEI build using that new version before I leave …
BTW, I don’t really understand the YAML used to describe the build process (from the master branch). But it looks like
$ ./bootstrap
$ mkdir build
$ cd build
$ ../configure
$ cd -
$ make -C build
should do the trick (perhaps adding -j 7
to the make
command on my 8-core machine), but I have not actually tried it yet.
The final step gives me Error 127, but I was previously able to build 1.7.11 by puzzling through the Github Action, so I'll go back and try that again...
Got this, but sudo apt install asciidoc fixed it:
make -C build
So I have a build from the branch, but the P5 tests fail the first time rnv is invoked with:
xmllint --noent --dropdtd testplus.xml | rnv testplus.rnc testplus.rnc:4853:20: error: syntax error: ")" expected, "" found testplus.rnc:4858:20: error: syntax error: ")" expected, "" found testplus.rnc:4884:20: error: syntax error: ")" expected, "" found testplus.rnc:4889:20: error: syntax error: ")" expected, "" found testplus.rnc:4966:17: error: syntax error: "}" expected, "{" found testplus.rnc:4968:3: error: syntax error: end of file expected, "}" found testplus.rnc:20776:31: error: missing start testplus.rnc:20776:31: error: undefined reference to 'xsd:float' ...and many more undefined refs to xsd:something...
I'm guessing this is something to do with the colon in the datatype.
Might be due to the unorthodox way I built it, but the good news is that the new pipe-instead version of rnv
from @hartwork’s repo, namespace-separator branch seems to run perfectly well without that out of memory error. The bad news is it found all sorts of valid constructs inside some of our RNC files invalid. E.g.:
p5odds.rnc:3547:15: error: syntax error: "{" expected, "*" found
p5odds.rnc:3547:17: error: syntax error: "{" unexpected
p5odds.rnc:3594:1: error: syntax error: end of file expected, "}" found
p5odds.rnc:15159:1: error: missing start
p5odds.rnc:15159:1: error: undefined reference to 'xsd:NMTOKEN'
p5odds.rnc:15159:1: error: undefined reference to 'xsd:ID'
p5odds.rnc:15159:1: error: undefined reference to 'xsd:IDREF'
p5odds.rnc:15159:1: error: undefined reference to 'xsd:QName'
p5odds.rnc:15159:1: error: undefined reference to 'xsd:anyURI'
p5odds.rnc:15159:1: error: undefined reference to 'xsd:NCName'
p5odds.rnc:15159:1: error: undefined reference to 'xsd:string'
Yes, it looks like complaints are centered on places where there are colons. But there are lots of places that have colons before that a:*
on line 3547 that did not cause errors. (It is the first occurrence of “something-colon-asterisk”, though.)
Yes, that may be it. In both cases (testplus.rnc and p5odds.rnc) things blow up on first occurrence of “something-colon-asterisk”.
Gotta go now. Back in a bit.
I would like to bring this pull request to your attention: https://github.com/libexpat/libexpat/pull/577
Any news? (Many systems have the CVE-2022-25236 relaxation fix from Expat 2.4.7 applied by now.)
Should we be trying a fresh build, expecting that libexpat sources will have been updated and the problem will be fixed?
Maybe, I don't know your infrastructure.
Just triggered a build on our main Jenkins and hit the memory issue:
java -jar ../Utilities/lib/trang.jar testplus.rng testplus.rnc
xmllint --noent --dropdtd testplus.xml | rnv testplus.rnc
stdin:1:0: error: out of memory
The Jenkins is running jenkins/jenkins:lts
image
@peterstadler thanks for sharing that. Image jenkins/jenkins:lts
is Debian GNU/Linux 11 (bullseye)
under the hood and bullseye is one of the unpatched places as of today. I know that someone is at it so I think that will be taken care of in the next few days, and then jenkins/jenkins:lts
may fix itself, depending on Jenkins' docker image creation automation. Thanks for trying!
PS: @peterstadler I only now see that first trang is converting .rng
to .rnc
and then RNV is called above. And you even have xmllint already. xmllint has an argument --relaxng SCHEMA
. Was there a conscious decision against using xmllint for Relax NG validation?
PS: @peterstadler I only now see that first trang is converting
.rng
to.rnc
and then RNV is called above. And you even have xmllint already. xmllint has an argument--relaxng SCHEMA
. Was there a conscious decision against using xmllint for Relax NG validation?
Hmm, no idea … this code is more than 12 years old, see https://github.com/TEIC/TEI/commit/e0f5dd10e09995e3da945ea2229b3802a900f1f0 (and search for xmllint
). Maybe @sydb has an idea?
I know that someone is at it so I think that will be taken care of in the next few days, and then
jenkins/jenkins:lts
may fix itself, depending on Jenkins' docker image creation automation.
The fix seems to have hit bullseye-security
by now with Expat 2.2.10-2+deb11u3
:
https://tracker.debian.org/news/1310255/accepted-expat-2210-2deb11u3-source-into-stable-security-embargoed-stable-security/ . It is just image jenkins/jenkins:lts
of 4 days ago that would need a rebuild and upload now, if I'm not mistaken.
Stylesheets group urges for getting the tests back. The current plan is to comment out the rnv
part for now (or even put it into a conditional which checks the expat version) and put it back in when the fix is propagated to our Jenkins image.
@peterstadler Docker image jenkins/jenkins:lts
was last built and pushed 5 hours ago but they do not apply all updates appearently and so their Expat is still lacking the needed backport patches. For proof:
# docker pull jenkins/jenkins:lts && docker run --rm -it jenkins/jenkins:lts apt-cache policy libexpat1
lts: Pulling from jenkins/jenkins
Digest: sha256:65cd83179ec68b047ac83875024d1515263ee36362161fe5962f7944071a0f5d
Status: Image is up to date for jenkins/jenkins:lts
docker.io/jenkins/jenkins:lts
libexpat1:
Installed: 2.2.10-2+deb11u2
Candidate: 2.2.10-2+deb11u2
Version table:
*** 2.2.10-2+deb11u2 100
100 /var/lib/dpkg/status
I see these ways forward:
jenkins/jenkins:lts
in CI over herejenkins/jenkins:lts
producers upstream to make their build process apply updates and push a fresh built, after@peterstadler I believe what you just did at https://github.com/TEIC/Jenkins/commit/2cce31196ea427a6b73d2a802abe93394b112d85 worked, this looks promising:
# docker pull teic/jenkins:dev && docker run --rm -it teic/jenkins:dev apt-cache policy libexpat1
dev: Pulling from teic/jenkins
Digest: sha256:eaaeed7cdb0064d39a1a5292f6ce1d559a4686f05038320853b3064f7260488f
Status: Image is up to date for teic/jenkins:dev
docker.io/teic/jenkins:dev
libexpat1:
Installed: 2.2.10-2+deb11u3
Candidate: 2.2.10-2+deb11u3
Version table:
*** 2.2.10-2+deb11u3 100
100 /var/lib/dpkg/status
Yes, thanks @hartwork for all the support and research!
I updated our Jenkins image in TEIC/Jenkins@2cce311 to explicitly run a apt-get dist-upgrade
which brings in the patched libexpat. This updated image is already deployed and the tests seem to work again, see https://jenkins.tei-c.org/view/TEI%20dev/ 🚀
This makes no sense (to me) at all. Both in my local system (where I can test things) and on Jenkins2 (UVic, where I can’t) the build is failing because the
rnv
command is reporting “error: out of memory” both on p5.xml (which is quite huge) and on testplus.xml (which is < 300 lines long, although to be fair, the schema it is validating against is huge).So I tried creating a 1-line schema and a 1-line XML instance to validate against it. On my system
rnv
reported “error: out of memory” even for these tiny files!I am running version 1.7.8 of
rnv
on my system, Jenkins2 is running version 1.8.0.