StormSurgeLive / asgs

The Automated Solution Generation System (ASGS) provides software infrastructure for automating coastal ocean modelling for real time decision support, and provides a variety of standalone command line tools for pre- and post-processing. Visit us at https://discord.gg/jFbacxrUf9
https://tools.adcirc.live
GNU General Public License v3.0
39 stars 22 forks source link

create fingerprint for each mesh as a consistent, unique ID #765

Open jasonfleming opened 2 years ago

jasonfleming commented 2 years ago

If a mesh is modified in a subtle way (e.g. bathymetry change on a small number of nodes), it may be difficult to detect. Each mesh version should have a unique name and version number as well as a 256 bit hash. The hash should be stored in the run.properties file as well as netCDF global attributes to indicate the specific mesh data in the file.

Required behavior (added by @wwlwpd):

wwlwpd commented 2 years ago

This is a good idea, I am wondering if there is a way to normalize the actual content in a repeatable way such that this is what is actually hashed rather than the "file". E.g., benign modifications (e.g., errant new lines at the beginning or end of the file) to the mesh should result in the same "hash" because it's based on the data and not any kind of accidental property of the file.

jasonfleming commented 2 years ago

Yes, hashing the file is the quick and dirty approach. To improve it, I wonder if we could use Nate's mesh reader from his ourPerl repo to read in the mesh, strip any leading/trailing white space or comments from each field, join it all into a single scalar string, and then hash that. Seems like the hash would only change then if the actual content changed.

wwlwpd commented 2 years ago

Yeah, you'd need a reader and consistent ordering of the internal hash itself. The file hash approach might work most of the time, and I don't think we're on any exotic platforms that are a different "endian" - but to play it safe we should just use a well defined "read" routine that ensures consistent ordering of the data. I'll take a look at Nate's reader, some perl that reads in a mesh is not complicated and I think I have something somewhere myself. It's not "consistent" either or suitable for this consistent hashing purpose without modification. Once we have a way to "hash" (or fingerprint) the meshes, we can add the info to mesh_defaults - or have an actual YAML based DB with the info...I've been wanting to do something like that for a while. The hash just needs to be right up front so that it is not something that can change when the wind blows.