Open ross-spencer opened 3 years ago
I wrote a schema for file formats with a file format identification pattern. This represents what is described in the Wikidata property in terms of expected qualifiers.
This is great Kat, thank you! I have a related question to follow up with at the weekend (no rush though). Just got to make it to the end of the working week to put it together, nearly there! :slightly_smiling_face:
Hi @emulatingkat I was wondering about this. Below is a mock-up of a screen in Wikidata. You'll see two format identification pattern statements. Is it possible to write a Wikidata entry like this? And can ShEX be used to mock something up like this for Wikidata records?
It is based on https://www.wikidata.org/w/index.php?title=Q27229608&oldid=784082439
It is one of the clearest examples where the statement alone tries to bring together too much information but doesn't provide any way to interpret it. In comparison to the record proper, I'd love to be able to use two separate statements to say that each statement makes up a different identification pattern for Siegfried (or any format identification tool) to work with.
It would help us to delineate sets of sequences better where multiple sets of signatures can be used for a format, e.g. here in PNG 1.1, or from sequences drawn from different sources, e.g. Kessler's table and PRONOM might have two different values that don't really belong in the same statement as each other, and should be interpreted as different signatures which contain 1..* sequences.
Having sets like this would be pretty powerful. There might be a better solution? Any pointers appreciated. When I started looking at container signatures in Wikidata, then I wondered if statements within statements would be possible?
Thanks for this example. I understand the complexity here thanks to your mockup.
All statements using a common property are displayed in the same statement block in Wikidata. Thus, as far as I know, the mockup you shared won't be possible in Wikidata.
We could propose a new property for something like a "compound file format identification pattern" or "multi-part format identification pattern" to express what you describe.
Thanks for highlighting this gap in the data model.
Description of problem
We need to encode a standard signature as a ShEX (Shape Expression). And then initiate a discussion with the Wikidata community about what that means for the existing records.
Related issues
A Shape Expression would partially ensure https://github.com/ross-spencer/WikiDP-Issues/issues/7 was more accurate, but it wouldn't improve the overall quality of the signature, i.e. we would still have trouble understanding it as two separate signatures for the same format.
Others