The problem

Currently, parsing URDF and SDF files takes a cherry-picking approach. We look for specific pieces of supported information (tags, attributes, etc.) Furthermore, the validation of the supported data is sporadic. This leads to the following issues:

The user may have silently ignored tags that they aren't aware are ignored. This could lead to meaningless edits.
The user encounters a runtime error during simulation, but isn't aware that it is directly related to their input files.
It's harder to understand Drake's relationship with URDF and SDF at any given commit because the same file might produce different outcomes and nothing would indicate that.

The proposal

We should validate the entire input file. Every piece of information should be one of:

Known and supported
Known and not supported
Known and deprecated (possibly with a known mapping to "known and supported")
Unknown

There should be run-time configurable parsing modes which dictates the behavior when anything other than "known and supported" is encountered, including, but not limited to:

Throw an exception on first infraction.
Throw an exception after collecting and reporting on all infractions
Merely document infractions
Ignore infractions.

Design questions

Where does the parsing "report" go?
- In the case of "documenting" parsing issues, there's a question of making it convenient. The user should have ready access to it such that it is useful as a debug tool. But when things are working it shouldn't clutter up their "workspace" (e.g., console).
What changes need to happen in sdformat to facilitate this?
- Being able to report line numbers for "errors".
- Eliminating default values because it interferes with more general definitions of defaults.
- Easing the API for accessing tags that are not strictly part of the SDF spec. I.e., sdformat knows the SDF spec and will validate all of that data. But, at our request, it currently passes through any unrecognized sub-trees. Those ad hoc trees are how we inject arbitrary Drake features. But they will also be the means by which other, unsupported and unrecognized tags may come in and we'll have to recognize them and report them. The API currently is targeted to looking for a specific tag in a specific location, not for introspection.

SeanCurtis-TRI commented 4 years ago

cc @jwnimmer-tri

jwnimmer-tri commented 4 years ago

There should be run-time configurable parsing modes ...

IMO all we need is a good default (probably just drake::logging), and an API-based way for a user to implement any of other the other options on their own (e.g., a callback settable on the Parser object such as std::function<void, const drake::multibody::ParseError&> on_error).

I don't think we need to provide sugar for any other error handling mechanisms, error consolidation, exceptions, etc. -- they will always be very application-specific.

We should also tie libsdformat's error reporting into this same channel (since we are wrapping it) per #9087.

EricCousineau-TRI commented 4 years ago

FTR Sean, @azeey, @scpeters, and myself also discussed this via video chat after the Slack convo, but only for SDFormat (and the libsdformat parser).

jwnimmer-tri commented 2 years ago

@rpoyner-tri I'll add this one onto the project board as well.

rpoyner-tri commented 2 years ago

How shall we subdivide this issue? We could certainly split by file type. Beyond that, are there distinct subtasks, other than just grinding through the implementation, piece by piece?

jwnimmer-tri commented 2 years ago

It depends a little bit on the implementation approach.

For SDFormat, it probably makes sense to split on "Known and not supported" vs "Unknown" since with the intermediate DOM those are going to be quite different.

For URDF, I'm not sure there's any difference.

rpoyner-tri commented 2 years ago

Sounds right. I'm happy to defer splitting this one for a bit, since I'll claim neither of us is attacking it directly yet. If you have reason to split sooner, feel free.

jwnimmer-tri commented 2 years ago

Hmm, @rpoyner-tri is this just a duplicate of #16229 (and the 5 sub-issues it was split into)? If yes, please close it. If no, please clarify what we're missing beyond those 5 sub-issues.

rpoyner-tri commented 2 years ago

For URDF, this is the only ticket that claims to care about emitting diagnostics for straight-up typos/bungling, (e.g. <transQQQmission>). Given where we are, it may make sense to table that goal, since it will require a pretty comprehensive restructuring, and tinyxml2 maybe doesn't help as much as we might like.

jwnimmer-tri commented 2 years ago

In that case, could I ask you to write up that detail into a (new, 6th) sub-issue, and then we close this issue as a duplicate of those 6 more detailed ones? As it stands, this issue scope is unclear and not actionable.

We can put that new issue onto our 80% backlog for now, and see if there's room for it.

rpoyner-tri commented 2 years ago

Closing this in favor of #16229 and #16997.

RobotLocomotion / drake

[parsing] Full input validation #12610

The problem

The proposal

Design questions