Open jhpoelen opened 1 year ago
Accordingly, I've manually annotated a wikimedia commons entry
with their associated checksums in sha1, sha-256 and md-5 speak.
a sample query
SELECT ?item ?image WHERE {
?item wdt:P4092 "85379b346e61c06033a12720155f3bf13d2c6f5946625600f34edace55cb159d693a15aefab9e15691ff2402887985d559951327974206ccf85495e27b9ee56d";
wdt:P18|wdt:P117 ?image .
}
LIMIT 10
Note that structured queries against objects in wikimedia commons are still under development. See for instance, https://diff.wikimedia.org/2020/10/29/sparql-in-the-shadow-of-structured-data-on-commons/ and referenced https://commons.wikimedia.org/wiki/Commons:Structured_data .
Also, note that annotating checksum properties (see https://www.wikidata.org/wiki/Property:P4092 ) on image properties in wikidata objects doesn't seem to come natural because qualifiers on qualifiers appears to be too much nesting for the wikidata model.
For instance, adding a checksum (or content hash) for an image that supports a physical interaction ( https://www.wikidata.org/wiki/Q2747101#P129 ) for a specific taxon https://www.wikidata.org/wiki/Q2747101 appears to be tricky with existing UI editing tools. E.g., is it currently hard to add a "determined by" quality SHA-1 algorithm for the checksum qualifier for the image related to the physical interaction property.
It appears that the wikimedia commons entities are a more natural fit . . . and some patience in needed before being able to access this structure commons data for reasons stated earlier.
So, as far as I can tell, querying wikimedia commons images by their checksums is possible, and a dedicated service / data product would have to be create to help answer questions like:
What are the check sums (or content hashes) associated with this wikimedia commons entity?
and
Please provide content associated with this content id (or checksum) if you have it. Otherwise, say "mweh, don't have it."
Internally, Wiki Commons uses sha1 hashes to alert users whether duplicate digital data is already available via Wiki Commons.
However, as far as I can tell, these sha1 hashes are not yet exposed via structured data by default.
And, methods already exist to annotate digital content with their checksums.
For example, see https://www.wikidata.org/wiki/Q34852 were https://www.wikidata.org/wiki/Property:P4092 is used to document sha-2 hash 8de979cbb1db728ef99debac8a516405a2088e4fa2816fda2769856a54029bcd49913a45494ce1cae4096413c49ae7da36f7bc2d20899fb216195b9eb365e55c associated with digital content .