SainsburyWellcomeCentre / aeon_mecha

Project Aeon's main library for interfacing with acquired data. Contains modules for raw data file io, data querying, data processing, data qc, database ingestion, and building computational data pipelines.
BSD 3-Clause "New" or "Revised" License
6 stars 6 forks source link

Avoid storing pose reader metadata in remote storage #418

Closed glopesdev closed 1 month ago

glopesdev commented 1 month ago

While working on creating a portable local dataset to demonstrate the use of the low-level API it became clear that the current solution for retrieving pose identity and model metadata is not very sustainable, particularly because of the way it depends on a specific location in CEPH to even access the pose data.

While it is possible to override this path, the whole construction is problematic for sharing and copying datasets, since essentially the dataset is not self-contained anymore and depends on files stored elsewhere. We should aim for datasets to be fully portable at least in terms of making the data required by readers accessible at load time.

We probably don't want or need to copy the entire model .pb file into the epoch folder due to size and redundancy, but we probably could make our lives much easier if we did copy the model confmap_config.json into the epoch folder as a singleton file.

One solution would be to store the model metadata as a sub-folder inside the data and allow the following structure:

📦CameraTop ┣ 📂topdown-multianimal-id-133 ┃ ┣ 📜centroid_config.json ┃ ┣ 📜confmap_config.json ┃ ┣ 📜info.json ┣ 📜CameraTop_200_2024-03-02T12-00-00.bin ┣ 📜CameraTop_201_2024-03-02T12-00-00.bin ┣ 📜CameraTop_2024-03-02T12-00-00.avi ┣ 📜CameraTop2024-03-02T12-00-00.csv ┣ 📜CameraToptopdown-multianimal-id-133_2024-03-02T12-00-00.bin

In this approach, only the model name is saved as a suffix in the tracking file, referring to the local folder. This could even be made backwards-compatible by falling back to scanning local paths in case the CEPH folder is not present.

Alternately, if a sub-folder proves problematic, we could also store just the confmap_config.json with a model name prefix, e.g. topdown-multianimal-id-133_confmap_config.json.

One outstanding issue here is how to link back model provenance, but this could be stored in the Metadata.yml or a local singleton metadata file inside the model metadata folder itself.