fermitools / declad

BSD 3-Clause "New" or "Revised" License
1 stars 2 forks source link

Possilby support metadata dumpers... #17

Closed marcmengel closed 3 months ago

marcmengel commented 4 months ago

Currently Declad expects files to be dropped off as a data file, and a corresponding metadata file, with the correspondence defined via the "meta_suffix" in the config file.

Many experiments are using the Art framework, which embeds most metatadata information about a file within the file. In those cases, one can call a metadata-extractor program to extract the metadata from said file and print it in the JSON format that Declad will accept.

Previous tools like Fermi-FTS could be configured to use a metadata extractor; currently declad does not support this; however a simple alternative is, if the "scanner" objects detect files without corresponding metadata files for more than one scan period, they could issue a configuration-specified command which would be given that list of files, and it could perform metadata extraction, generating the metadata files needed for Declad to process them. If this ran as a separate process, with at most N such processes outstanding, this would modify the current Declad code hardly at all, but would provide similar features to the old Fermi-FTS.

The list of files for this pass would be at https://github.com/fermitools/declad/blob/main/declad/local_scanner.py#L116 available as: set(data_desc.keys()) - metadata_files -- files we have seen but we have not seen their metadata. If we kept the set of such files from the preivous pass, the intersection would give us files to hand off to the metadata extractor.

This has been requested by Andrew Norman.

marcmengel commented 3 months ago

This is now working in #23.