RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.16k stars 1.24k forks source link

Meldis auto-reloading misses changes to glTF files #21194

Open SeanCurtis-TRI opened 3 months ago

SeanCurtis-TRI commented 3 months ago

What happened?

Meldis's _GeometryFileHasher provides meldis the ability to recognize if a model has changed on disk since it was last loaded and intelligently reload (by considering the set of files that define the model). However, that logic is heavily obj-centric. Changes to glTF data on disk do not trigger a model reload.

Now that Meshcat (the Drake class, not the javascript library) has the intelligence to run down all linked assets, that should be exploited in _meldis.py so that the right set of assets are included in the hashing.

Version

No response

What operating system are you using?

No response

What installation option are you using?

No response

Relevant log output

No response

jwnimmer-tri commented 3 months ago

Per slack, we might attack this in two stages:

(1) Fix the _GeometryFileHasher to know how to chase gltf uris.

(2) Refactor Meldis to lean on Meshcat::SetObject to report back exactly which files were relevant, so we can delete the bespoke Python logic.

SeanCurtis-TRI commented 2 months ago

Stage one is in flight with #21297.

I think stage two requires a bit more discussion.

It's certainly reasonable to modify the return value of void Meshcat::SetObject(string_view, const Shape&, const Rgba&) to return a vector of paths. But what's unclear to me is how that benefits meldis.

Meldis would only find out what files were necessary after dispatching objects to meldis. Sure, it can save those paths and create a hash on what was already set into Meshcat. However, what good does that do when it gets the next load message? Knowing what's currently in meshcat doesn't help in interpreting the incoming message. If the only way to do that is set things into meshcat and ask what got set, hasn't it already done the work?

I can imagine the hasher creating its own Meshcat instance with a functionally non-communicative port so it can blindly register things and harvest the result. After all, it would really only ever need to register meshes, not everything in the message.

Alternatively, we can try to refactor the code in Meshcat so that the logic for assembling the family of files associated with a mesh is independent of what is done with that information? And then that core functionality can be used by a meshcat instance to "peek" but not "act".

Can you elaborate on what you had in mind?

jwnimmer-tri commented 2 months ago

If the new load message is the same as the old load message, and all of the files used in the prior load message still exist and are unchanged, then we know for certain that no reload is necessary. So all we need to do is keep track of the prior load message's overall list of files and their hashes (or maybe one overall hash), and that's all we need.

SeanCurtis-TRI commented 2 months ago

File existence should not be the arbiter of whether the load message matches. I can remove a reference to an image without deleting the image. Furthermore, I can add an image to a model and it won't be captured by what I previously loaded.

I'm leaning toward simply creating an API that takes a Mesh and returns a list of file info (at its simplest, just paths). That would also simplify what Meshcat does to handle objs. Ultimately, it would localize functionality that is currently distributed across Meshcat and FileStorage and make it accessible for anyone who is curious for whatever reason.

jwnimmer-tri commented 2 months ago

If you remove the reference to an image (or add one), the checksum of the gltf (or whatever) that mentions that image will change.

jwnimmer-tri commented 2 months ago

I am pretty convinced that the right Meldis approach is "Meshcat tells us what files it touched" since that will very clearly be correct and never diverge as the code chnages, instead of having two different control paths where we hope that Mesh.tell_me_files() matches what Meshcat did. For all we know, Meshcat might eventually do things that go beyond what Mesh.tell_me_files() knows about, or ignore e.g. images that are for gltf extensionsUsed that are always ignored on the JS side that Mesh.tell_me_files() would need to return in general.

SeanCurtis-TRI commented 2 months ago

Touche. I keep forgetting hashing the original file.

So, to summarize:

On the subject of "subtracting bytes from a hash value":

One possible solution is to replace a single aggregate hash value with a set of hash_values (on per file). Instead of comparing aggregate hash values, we compare sets of values. This would handle both cases where the set of files have changed (new files now being used and old files no longer used but not deleted).

jwnimmer-tri commented 2 months ago

The flavor I imagined was that Meshcat.SetObject would return[1] the list of all relevant files, including the Mesh/Convex filename. I imagine that simplifies everything -- Meldis would not need to do anything with the load message's filename other than pass it to SetObject and then remember the list of files that comes back. We could promise the Mesh.filename() was first in the returned list, in case that helps anything. I'm not wedded to that though -- if for some reason treating the Mesh.filename() differently made the code cleaner then I'd be fine with that.

I'm open to the idea of multiple hash values, but my guess is that a single overall hash still ends up simpler. The "did any of these set of hashes change" versus "did the combined hash over this set of hashes change" seem equivalent. Again, though, I'm willing to trust however the simplest code comes out, whatever that looks like.

[1] ... or output argument, or return struct with named fields for future expansion, or ... etc

SeanCurtis-TRI commented 2 months ago

I've simplified from what I previously wrote and will be submitting a PR soon. Ultimately, I'm trying to minimize the amount of hashing I'm doing. The goal is two fold:

One resolution, obviously, is to remove the second requirement. But that just seems inelegant to me. But we can move this conversation over to a concrete PR in a bit.