The current Python implementation is fine, if messy. But for format registries, it's essentially implementing the same thing as Siegfried's roy tool. Rather than keeping this separate tool updated, it could be merged with roy and perhaps modify roy so that it can output the full normalized registry contents as YAML/JSON. This might be quite a lot of work though, and will need to be in Go rather than Python, so probably a long-term goal.
Some of the steps appear to be:
[ ] add an option to roy inspect so it emits the whole normalised dataset as YAML or similar.
[ ] add support for all known format registries to roy (FFW, GitHub Linguist, TRiD, ???).
[ ] modify the wikidata.sig build so the Archiveamatica extensions can be omitted (like -pronom)
[ ] modify the digipres.org and sentinel systems to run roy to gather the data and aggregate that instead.
The current Python implementation is fine, if messy. But for format registries, it's essentially implementing the same thing as Siegfried's
roy
tool. Rather than keeping this separate tool updated, it could be merged withroy
and perhaps modifyroy
so that it can output the full normalized registry contents as YAML/JSON. This might be quite a lot of work though, and will need to be in Go rather than Python, so probably a long-term goal.Some of the steps appear to be:
roy inspect
so it emits the whole normalised dataset as YAML or similar.roy
(FFW, GitHub Linguist, TRiD, ???).wikidata.sig
build so the Archiveamatica extensions can be omitted (like-pronom
)roy
to gather the data and aggregate that instead.