Open rpoyner-tri opened 2 years ago
See #16229 for an overview of parser improvements.
See also https://github.com/RobotLocomotion/drake/issues/16785#issuecomment-1098445866 for an implementation idea:
For URDF, separate from any other parsing, we could walk the entire tinyxml2 tree from the root on down and check every element name and attribute name against a position-agnostic allow-list (i.e., irrespective of its parent within the tree), producing an error for any completely-unknown name. That would catch typos where the name was wrong no matter where it might live in the tree, but it would not detect elements or attributes that were located in the wrong position within the tree. So, it is not very precise, but on the other hand it's only ~100 lines of code to implement.
Or on another tack, the libsdformat syntax checker and low-level DOM is actually quite good (https://github.com/ignitionrobotics/sdformat/tree/sdf12/sdf/1.9). With maybe a few days of work, we could encode (Drake's subset of) the URDF specification into that format, and then use libsdformat to do the syntax checking. Possibly even OpenRobotics would be interested to collaborate on that.
See https://github.com/RobotLocomotion/drake/issues/18796#issue-1586876256 for another example of an uncaught typo.
Or on another tack, the libsdformat syntax checker and low-level DOM is actually quite good. With maybe a few days of work, we could encode (Drake's subset of) the URDF specification into that format, and then use libsdformat to do the syntax checking.
I did a prototype of this last night (branch).
The sdf::Element
API is fine, but the readXml function (and the helpers it calls, like readAttributes
) are completely unsuitable for re-use. Instead, I needed to re-implement the "map tinyxml into Element tree" function by hand, which really reduces the value we're getting from using the library.
Another idea we've broached was using a Relax NG schema to check the XML, but then parse the same as we do now. I'd be curious to know if there are any currently-maintained RNG implementations in C++ that we could try.
Another idea would be to write a Serialize-NVP-based mapper from tinyxml nodes to C++ structs. That would have approximately the same benefits at the sdf::Element
API (schema checking and string=>double parsing), but without tying ourselves to a library that comes with too much baggage.
Ah, schema checkers. I had thoughts about this a year and a half ago, but they've mostly drained away. I will see what I can recover.
Thanks for doing the architecture spike on the sdformat library hypothesis.
drake:
custom tags, but it could pretty easily do so.Things I'm happy about so far:
Things that need more study:
Sounds promising!
A quick look through the RNV project source code has me doubting whether it can be easily consumed as a library.
C
technique of file-static data -- effectively private global variables.exit(1)
if allocation fails.A possible alternative would be to invoke the rnv
binary executable as a child process.
For runtime validation by our Parser
it sounds like RNV might not be the best fit. I like the RNG compact file for the offline schema, though.
It appears that there are tools to convert RNG to a DTD. As I understand it, in general not all RNG schema are expressible as DTD but I think for the simple things we have in URDF it would be? We could keep the RNG in git, and have Bazel convert it to a DTD. I presume that there are a lot more C++-compatible libraries to check the DTD, and we could lean on one of those instead.
RNC/RNG can also be converted to XSD, for which we could also certainly find validators in existing libraries. (XSD ==> RNG is apparently not something that is done, owing to the complexity of the XSD spec.)
It may turn out that the translation tools (bazel build time only, as I read the above proposal) might be implemented in Java(?!?!). I suspect we don't care if we can get system/Brew packages for them, but it's a detail to watch out for.
To the extent any schema converters are not as portable as we might wish, we could do the standard trick of committing the generated XSD (or etc) to our git tree, and then having a linter test that checks that the generated file is up-to-date. That way, only developers are exposed to the conversion tool, instead of end users.
Probably then some combo of jing
(command-line validator), trang
(schema format converter), and validation library code from libxml2.
Brew offers a jing-trang
package. Ubuntu has jing
and trang
as packages.
Anything we use for linting would be OK to come from a package manager, if we like. Anything in libdrake.so
we'll be rebuilding from source (vendored) no matter what.
Reflecting on the vendored-source bit, I wonder which we'd rather consume: libxml++/libxml2 stack (maybe big, but probably somewhat well-behaved), or rnv (tiny, but needing a variety of patches).
This is a reduced-scope version of #12610. For URDF, we do not yet have a mechanism that can reasonably complain about completely unknown wrong input data. The current implementation relies on tinyxml2 to build a "DOM" tree, which the "parser" then effectively performs bespoke queries against. Since tinyxml2 does not even guarantee well-formedness, it is possible to have files full of stray garbage (#16516), near-miss formulations of known tags, etc.
The victory condition of this ticket is to produce a parser that can complain about unknown/malformed elements in a controllable, comprehensible, actionable way.
It may be a lot of work.
Possible paths include: