Open mjy opened 3 years ago
Hm. See if this paper helps with documenting (unambiguously) what is meant by "unknown." Note that #DiSSCo folks are thinking hard about this and want to standardize use of "unknown" across their network if possible. See
Quentin Groom, Mathias Dillen, Helen Hardy, Sarah Phillips, Luc Willemse, Zhengzhe Wu, Improved standardization of transcribed digital specimen data, Database, Volume 2019, 2019, baz129, https://doi.org/10.1093/database/baz129
Table 2 from their paper (regarding Unknown and incomplete data): Missing data terms | Definition | Example |
---|---|---|
unknown | The information is not digitally available. | Empty value in a digital record of unknown provenance |
unknown:undigitized | The information is not digitally available. No attempt has been made to digitize it. | Empty value in a skeletal record to which data still need to be added from the label |
unknown:missing | The information is not digitally available. It appeared to be absent during digitization. | A value of S.D. used by transcription platforms to indicate the absence of a date value |
unknown:indecipherable | The information is not digitally available. It appeared to be present during digitization, but failed to be captured. | An indication made by a transcriber that they failed to transcribe the information |
known:withheld | The information is digitally available, but it has been withheld by the provider. | A georeferenced record for which coordinate data are available but withheld for conservation considerations |
Thanks. All of these are valid assertions, none of these are the assertion of "unknowable" :)
So, a good one for them to try and add!
All of these are valid assertions, none of these are the assertion of "unknowable" :)
Hm. unknown:indecipherable might be why something is "unknowable."
Not the same I think. That is data is present, but computers can't infer on it.
I find this somewhat telling. Rather than start with what curators might tell us, and try to get that in the standard, this seem to start with a digital product, and its nature. I.e. the most basic assertion a curator on the ground needs is "I can not do more with this because the physical thing is destroyed". Everything else for them is "bonus".
At present the semantics are to assign a Confidence with the definition along the lines of "I am confident and assert that that this attribute on this instance of this class is unknowable". Specific confidence levels that extend this concept to add "why?" are possible, for example:
It is perhaps best to use the fewest possible number of reasons as to why something is unknowable, as it is highly doutbful that curating to a finer granularity will actually result in meaningful broader data integration etc. The principal is, minimize the amount of down-stream re-interpretation you are forcing people to do. Downstream consumers of your assertions (e.g. scientists doing science with your data) are going to operate on a few boolean descisions as to wether or not the data are useful for their needs.