Open stephenhart8 opened 4 years ago
I am not sure what you mean by unknown and absent data: do we mean differentiating between information that we know exists but is absent from the dataset (i.e. the cataloguer looked for the information but could not find it, but knows it is available somewhere) and information we know is not available anywhere (i.e. the cataloguer looked in other datasets as well as offline documentation and established that an information is not available)?
If so I think it might be useful so that if Museum A looked for a piece of information exhaustively another institution might trust their work and prefer to focus on tracking other data that would be as impactful but more readily available. I think it would be useful in the context of crowdsourcing as well as it would flag data that would be most interesting to investigate.
it's very difficult to represent in an open world. Recently discussed at CRM SIG and Linked Conservaiton Data workshops. It's hard to represent this kind of data in present day information systems
It seems that we would be relying on the source museum/institution to note whether the information is unknown or absent, yes? I don't know what kind of barrier to participation this would introduce if we were to add this kind of review as a requirement. Do you have a particular example or use case in mind that we could use to illustrate the problem and conceptually test out solutions?
In LOD usually you don't state "I don't know" because such statements are non monothonic in the face of OWA (and because what we don't know is infinite :-).
So I would not go for some generic "missing value" patterns. Wikidata has novalue
and unknown
but their use is a bit controversial https://phabricator.wikimedia.org/T239414
If you know someone died but not when, make a Death event without TimeSpan.
I came across an example for this issue while working on MAC Artistes dataset: For "Levy, Albert" (French Photographer), the museum left his Death Place blank, but input "inconnue" for his Death Date. He was born in 1864, so we know for certain that he had passed away (at this point in time).
I've read some interesting researches by the Antike Fundmünzel in Europa about managing uncertain data, published at the CAA:
They propose multiple solutions, including:
E13 Attribute Assignment
I will add those references to Zotero.
@stephenhart8 thanks for the third reference!
I think you should make a separate issue on uncertainty and Attribution Qualifiers. See
Although the open world argument is strong, in the case of autority data, the distinction between unknown or absent information can make sense.
In museum datasets, some information or data are simply missing, even if we know this information exists somewhere, like the birthdate of someone. But sometime, this information does not exist, like the death date of someone still alive.
The problem is that in out model, there is no way to distinguish those two types of lack information. If the death date of someone is lacking in our dataset, it is not possible to know if we just haven't fully documented the record, or if this information does not exist.