A handful of records have a value in the offset: X±Y to seemingly denote maximum offset which I believe is a variable position offset in PRONOM, so this value can be anywhere in the first or last range of 72 bytes.
While this seems like a reasonable shortcut the value isn't decoding so we only get one value in the SPARQL result. I.e. if we expect 72±72. We receive 72 in the SPARQL result or the WQS UI.
Other considerations here are, if this is data to encode, we actually have to parse this further to decide what type of field we're looking at: if ± then this is a maximum offset and not a regular offset. Because we receive "some value" i.e. 72 we also don't know there is a problem with the data. I don't think we can know if there is a problem with the data without also knowing the Wikidata record that we are looking at.
So I am wondering, 1. if this is first an issue that should be logged with Wikibase which seems sensible, and 2. if this is another piece of work for me to look at to bring into Wikidata the concept of maximum offsets in association with the ShEX work needed.
NB.± is a plus/minus symbol. Comparing with PRONOM 72±72 is actually trying to denote a maximum offset of 144. Which I can work out from that string, but I wouldn't have known how to use otherwise.
Description of problem
A handful of records have a value in the offset:
X±Y
to seemingly denote maximum offset which I believe is a variable position offset in PRONOM, sothis value can be anywhere in the first or last range of 72 bytes
.While this seems like a reasonable shortcut the value isn't decoding so we only get one value in the SPARQL result. I.e. if we expect
72±72
. We receive72
in the SPARQL result or the WQS UI.Other considerations here are, if this is
data
to encode, we actually have to parse this further to decide what type of field we're looking at:if ± then this is a maximum offset and not a regular offset
. Because we receive "some value" i.e.72
we also don't know there is a problem with the data. I don't think we can know if there is a problem with the data without also knowing the Wikidata record that we are looking at.So I am wondering, 1. if this is first an issue that should be logged with Wikibase which seems sensible, and 2. if this is another piece of work for me to look at to bring into Wikidata the concept of maximum offsets in association with the ShEX work needed.
NB.
±
is a plus/minus symbol. Comparing with PRONOM72±72
is actually trying to denote a maximum offset of144
. Which I can work out from that string, but I wouldn't have known how to use otherwise.Permalink
Other examples
These all seem to be PDF variants
Notes on auditing
I accessed the PRONOM reports and output a rudimentary subset of the XML:
And then cross-referenced as many of those as possible with: