cta-observatory / ctapipe

Low-level data processing pipeline software for CTAO or similar arrays of Imaging Atmospheric Cherenkov Telescopes
https://ctapipe.readthedocs.org
BSD 3-Clause "New" or "Revised" License
64 stars 268 forks source link

`Provenance().add_input_file()` should take product id #2571

Closed maxnoe closed 1 month ago

maxnoe commented 3 months ago

Please describe the use case that requires this feature.

Add the moment, the provenance system only stores the path of an input file.

However, for CTAO data products used as input, the most important thing to store would actually be the product id of the reference metadata, not just the path.

Describe the solution you'd like Optionally also store product uuid in provenance.

kosack commented 3 months ago

Would be useful to also add a nice helper function to extract the product_id from existing files. In fact that could just be automatic: add_input_file(path) could call get_product_id(path), and if it fails, emit a warning and continue as we do now without recording it. That way it's more foolproof that the real product_id gets propagated.

The case where some external files, like those from current SimTelArray, do not have a product_id because should be handled gracefully. Of course in future productions, we could simply add one to their metadata.