RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.23k stars 1.25k forks source link

URDF parser: emit diagnostics for typos, malformed documents, etc. #16997

Open rpoyner-tri opened 2 years ago

rpoyner-tri commented 2 years ago

This is a reduced-scope version of #12610. For URDF, we do not yet have a mechanism that can reasonably complain about completely unknown wrong input data. The current implementation relies on tinyxml2 to build a "DOM" tree, which the "parser" then effectively performs bespoke queries against. Since tinyxml2 does not even guarantee well-formedness, it is possible to have files full of stray garbage (#16516), near-miss formulations of known tags, etc.

The victory condition of this ticket is to produce a parser that can complain about unknown/malformed elements in a controllable, comprehensible, actionable way.

It may be a lot of work.

Possible paths include:

rpoyner-tri commented 2 years ago

See #16229 for an overview of parser improvements.

jwnimmer-tri commented 2 years ago

See also https://github.com/RobotLocomotion/drake/issues/16785#issuecomment-1098445866 for an implementation idea:

For URDF, separate from any other parsing, we could walk the entire tinyxml2 tree from the root on down and check every element name and attribute name against a position-agnostic allow-list (i.e., irrespective of its parent within the tree), producing an error for any completely-unknown name. That would catch typos where the name was wrong no matter where it might live in the tree, but it would not detect elements or attributes that were located in the wrong position within the tree. So, it is not very precise, but on the other hand it's only ~100 lines of code to implement.

Or on another tack, the libsdformat syntax checker and low-level DOM is actually quite good (https://github.com/ignitionrobotics/sdformat/tree/sdf12/sdf/1.9). With maybe a few days of work, we could encode (Drake's subset of) the URDF specification into that format, and then use libsdformat to do the syntax checking. Possibly even OpenRobotics would be interested to collaborate on that.

jwnimmer-tri commented 1 year ago

See https://github.com/RobotLocomotion/drake/issues/18796#issue-1586876256 for another example of an uncaught typo.

jwnimmer-tri commented 1 year ago

Or on another tack, the libsdformat syntax checker and low-level DOM is actually quite good. With maybe a few days of work, we could encode (Drake's subset of) the URDF specification into that format, and then use libsdformat to do the syntax checking.

I did a prototype of this last night (branch).

The sdf::Element API is fine, but the readXml function (and the helpers it calls, like readAttributes) are completely unsuitable for re-use. Instead, I needed to re-implement the "map tinyxml into Element tree" function by hand, which really reduces the value we're getting from using the library.

Another idea we've broached was using a Relax NG schema to check the XML, but then parse the same as we do now. I'd be curious to know if there are any currently-maintained RNG implementations in C++ that we could try.

Another idea would be to write a Serialize-NVP-based mapper from tinyxml nodes to C++ structs. That would have approximately the same benefits at the sdf::Element API (schema checking and string=>double parsing), but without tying ourselves to a library that comes with too much baggage.

rpoyner-tri commented 1 year ago

Ah, schema checkers. I had thoughts about this a year and a half ago, but they've mostly drained away. I will see what I can recover.

Thanks for doing the architecture spike on the sdformat library hypothesis.

rpoyner-tri commented 11 months ago

20193 has the beginnings of a Relax NG compact syntax (RNC) schema. It doesn't yet fully check the drake: custom tags, but it could pretty easily do so.

Things I'm happy about so far:

Things that need more study:

jwnimmer-tri commented 11 months ago

Sounds promising!

rpoyner-tri commented 11 months ago

A quick look through the RNV project source code has me doubting whether it can be easily consumed as a library.

A possible alternative would be to invoke the rnv binary executable as a child process.

jwnimmer-tri commented 11 months ago

For runtime validation by our Parser it sounds like RNV might not be the best fit. I like the RNG compact file for the offline schema, though.

It appears that there are tools to convert RNG to a DTD. As I understand it, in general not all RNG schema are expressible as DTD but I think for the simple things we have in URDF it would be? We could keep the RNG in git, and have Bazel convert it to a DTD. I presume that there are a lot more C++-compatible libraries to check the DTD, and we could lean on one of those instead.

rpoyner-tri commented 11 months ago

RNC/RNG can also be converted to XSD, for which we could also certainly find validators in existing libraries. (XSD ==> RNG is apparently not something that is done, owing to the complexity of the XSD spec.)

rpoyner-tri commented 11 months ago

It may turn out that the translation tools (bazel build time only, as I read the above proposal) might be implemented in Java(?!?!). I suspect we don't care if we can get system/Brew packages for them, but it's a detail to watch out for.

jwnimmer-tri commented 11 months ago

To the extent any schema converters are not as portable as we might wish, we could do the standard trick of committing the generated XSD (or etc) to our git tree, and then having a linter test that checks that the generated file is up-to-date. That way, only developers are exposed to the conversion tool, instead of end users.

rpoyner-tri commented 11 months ago

Probably then some combo of jing (command-line validator), trang (schema format converter), and validation library code from libxml2.

Brew offers a jing-trang package. Ubuntu has jing and trang as packages.

jwnimmer-tri commented 11 months ago

Anything we use for linting would be OK to come from a package manager, if we like. Anything in libdrake.so we'll be rebuilding from source (vendored) no matter what.

rpoyner-tri commented 11 months ago

Reflecting on the vendored-source bit, I wonder which we'd rather consume: libxml++/libxml2 stack (maybe big, but probably somewhat well-behaved), or rnv (tiny, but needing a variety of patches).