SpeciesFileGroup / taxonworks

Workbench for biodiversity informatics.
http://taxonworks.org
Other
86 stars 25 forks source link

Permanent links to animal images in vivo #1863

Open adrik29 opened 3 years ago

adrik29 commented 3 years ago

There are more and more zoologists, naturalist, photographers and hobbyists uploading high quality images of animals in vivo to websites such as iNat, Flickr, Natura-whatever. In general all my requests to reuse those images for example in my Opiliones-Wikia are complied without any problem. People are very often happy to see their work used elsewhere, once due credit is given. I think those unvouchered images of specimens are also useful in TW, not forgetting our ordinary images from the literature of museum specimens.

Now my question:

Do we have a preferred storing cyberplace for animal images from which we can point the URL in TW? Is it preferable for us to upload a copy of those images onto TW rather than linking?

mjy commented 3 years ago

In mx we recommend and used Morphbank and virtually link to it as you envision, we don't recommend that anymore and won't be supporting that linkage in TaxonWorks. It is unfortunate that there is not a global raw-image silo, i.e. a place for us to push images and ONLY the image, and know that the URL/URI to the image will last "forever". That silo will never exist largely because people suck, and would fill it with images that are a) not biological, and b) horrible in ways you might imagine.

Right now I suspect Figshare, Zenodo, Wiki(media?) might be the top candidates, but frankly we need to determine if a) we can use their API to reference individual images and b) we then need to built virtual image subclasses.

I'm going to assign @debpaul this issue to have her find and identify APIs that could be used in general (see also iNaturalist, Flickr, other photo sharing sites). Once we have those figured out we will add an issue to subclass images (if it doesn't already exist here).

@debpaul For an organization to be a candidate it must minimally a) provide a permanent URL via some API to the original image that does not require authentication, major bonus for b) thumbs or resizing options via the URL. We might also note the attribution metadata coming via the API etc.

Ping also @jhpoelen re permanency/reuse of image ideas.

jhpoelen commented 3 years ago

@mjy Thanks for including me this discussion.

As hinted in the comments above by the claim "people suck", the idea of a "permanent" URL (aka purl) is largely aspirational and measurably so Elliott et al. 2020. URLs are not designed to be permanently associated with specific content. Same applies for DOIs.

However, schemes exist to uniquely identify content (e.g., images) using content-based identifiers. Carl Boettiger recently gave a webinar on the topic: https://www.dataone.org/webinars/iterative-forecasts/ .

In summary the idea is to first identify the content using a content-based identifier, and then register the (not so permanent) locations that the image exists.

In the https://jhpoelen.nl/bees , you can find examples of this using specimen images.

For instance, image of the impaled bee (see attached) associated with mediarecord https://jhpoelen.nl/bees/0b5b4499-1055-4adc-9017-be1dc5914c13 is identified by hash://sha256/8d49bd24f6ba300b4de44fd218b53294f4cc0106cd9631018ef819b38345c75d . This unique identifier can then be used to find locations at which the image is available. Using the https://hash-archive.org/, a registry of content-based ids, you can find all the locations that have known to serve that exact image (see https://hash-archive.org/sources/hash://sha256/8d49bd24f6ba300b4de44fd218b53294f4cc0106cd9631018ef819b38345c75d or attached screenshot) .

In this example, the image is hosted at harvard, idigbio, jhpoelen.nl/bees, the software heritage, and . . . now by github via the attached bee image to this issue. With the copies floating around all over the place and with one or more registries keeping track of them, it is more likely that you'll find at least one location that still hosts the image. And . . . once you receive the image from that location, you can independently verify its authenticity by calculating it's content-based identifier independently.

I imagine that TaxonWorks can quite easily adopt these content-based identifiers in addition to the existing (aspirationally) permanent URLs. Also, TaxonWorks is ideally suited to provide a content-registry like hash-archive.org or build their own "where's my content" or "where did my content come from?" service. Great for tracking usage too . . .

bee

Screenshot from 2020-10-30 08-52-20

mjy commented 3 years ago

@jhpoelen thanks so much. So I think we could enumerate the practical steps to work this out? For example:

jhpoelen commented 3 years ago

Hey Matt -

Perhaps the API of hash-archive.org would provide a good starting point.

$ curl "https://hash-archive.org/api/sources/hash://sha256/05b41ea4707614b20c1f55b47979989ee10ed62dd17322afab1451750df8119e"
[
    {
        "url": "https://archive.softwareheritage.org/api/1/content/sha256:05b41ea4707614b20c1f55b47979989ee10ed62dd17322afab1451750df8119e/raw/",
        "timestamp": 1600796387,
        "status": 200,
        "type": "application/octet-stream",
        "length": 148950,
        "hashes": [
            "md5-gJ3+b/E4oafUbLVqv9Tr1g==",
            "sha1-hI/McOvVuFrfOUlFFVXfls+6Nj0=",
            "sha256-BbQepHB2FLIMH1W0eXmYnuEO1i3RcyKvqxRRdQ34EZ4=",
            "sha384-SW7y3hs2meaK3CbXurLKLDLN6N99wSyVoDPuO2daOzdL41cBPZ9RLgYMlyHMI5J8",
            "sha512-YtFNdrgPXQZYTGn/ZyvuLNyF0V8RlgAe+ksK/4H0BQsfb8edYhHaIfoeku9XiMSudFysLVaYtS0v9QoXcQF1"
        ]
    },
    {
        "url": "https://jhpoelen.nl/bees/data/05/b4/05b41ea4707614b20c1f55b47979989ee10ed62dd17322afab1451750df8119e",
        "timestamp": 1600295176,
        "status": 200,
        "type": "application/octet-stream",
        "length": 148950,
        "hashes": [
            "md5-gJ3+b/E4oafUbLVqv9Tr1g==",
            "sha1-hI/McOvVuFrfOUlFFVXfls+6Nj0=",
            "sha256-BbQepHB2FLIMH1W0eXmYnuEO1i3RcyKvqxRRdQ34EZ4=",
            "sha384-SW7y3hs2meaK3CbXurLKLDLN6N99wSyVoDPuO2daOzdL41cBPZ9RLgYMlyHMI5J8",
            "sha512-YtFNdrgPXQZYTGn/ZyvuLNyF0V8RlgAe+ksK/4H0BQsfb8edYhHaIfoeku9XiMSudFysLVaYtS0v9QoXcQF1"
        ]
    },
    {
        "url": "https://archive.softwareheritage.org/api/1/content/sha256:05b41ea4707614b20c1f55b47979989ee10ed62dd17322afab1451750df8119e/raw/",
        "timestamp": 1600233960,
        "status": 200,
        "type": "application/octet-stream",
        "length": 148950,
        "hashes": [
            "md5-gJ3+b/E4oafUbLVqv9Tr1g==",
            "sha1-hI/McOvVuFrfOUlFFVXfls+6Nj0=",
            "sha256-BbQepHB2FLIMH1W0eXmYnuEO1i3RcyKvqxRRdQ34EZ4=",
            "sha384-SW7y3hs2meaK3CbXurLKLDLN6N99wSyVoDPuO2daOzdL41cBPZ9RLgYMlyHMI5J8",
            "sha512-YtFNdrgPXQZYTGn/ZyvuLNyF0V8RlgAe+ksK/4H0BQsfb8edYhHaIfoeku9XiMSudFysLVaYtS0v9QoXcQF1"
        ]
    },
    {
        "url": "http://mczbase.mcz.harvard.edu/specimen_images/entomology/large/MCZ-ENT00017219_Spinoliella_puellae_win.jpg",
        "timestamp": 1600214919,
        "status": 200,
        "type": "image/jpeg",
        "length": 148950,
        "hashes": [
            "md5-gJ3+b/E4oafUbLVqv9Tr1g==",
            "sha1-hI/McOvVuFrfOUlFFVXfls+6Nj0=",
            "sha256-BbQepHB2FLIMH1W0eXmYnuEO1i3RcyKvqxRRdQ34EZ4=",
            "sha384-SW7y3hs2meaK3CbXurLKLDLN6N99wSyVoDPuO2daOzdL41cBPZ9RLgYMlyHMI5J8",
            "sha512-YtFNdrgPXQZYTGn/ZyvuLNyF0V8RlgAe+ksK/4H0BQsfb8edYhHaIfoeku9XiMSudFysLVaYtS0v9QoXcQF1"
        ]
    },
    {
        "url": "https://api.idigbio.org/v2/media/6e0122c9-28b7-4bd2-9a0c-46bb3346c713?size=fullsize",
        "timestamp": 1600214841,
        "status": 200,
        "type": "image/jpeg",
        "length": 148950,
        "hashes": [
            "md5-gJ3+b/E4oafUbLVqv9Tr1g==",
            "sha1-hI/McOvVuFrfOUlFFVXfls+6Nj0=",
            "sha256-BbQepHB2FLIMH1W0eXmYnuEO1i3RcyKvqxRRdQ34EZ4=",
            "sha384-SW7y3hs2meaK3CbXurLKLDLN6N99wSyVoDPuO2daOzdL41cBPZ9RLgYMlyHMI5J8",
            "sha512-YtFNdrgPXQZYTGn/ZyvuLNyF0V8RlgAe+ksK/4H0BQsfb8edYhHaIfoeku9XiMSudFysLVaYtS0v9QoXcQF1"
        ]
    },
    {
        "url": "https://jhpoelen.nl/bees/data/05/b4/05b41ea4707614b20c1f55b47979989ee10ed62dd17322afab1451750df8119e",
        "timestamp": 1600118110,
        "status": 200,
        "type": "application/octet-stream",
        "length": 148950,
        "hashes": [
            "md5-gJ3+b/E4oafUbLVqv9Tr1g==",
            "sha1-hI/McOvVuFrfOUlFFVXfls+6Nj0=",
            "sha256-BbQepHB2FLIMH1W0eXmYnuEO1i3RcyKvqxRRdQ34EZ4=",
            "sha384-SW7y3hs2meaK3CbXurLKLDLN6N99wSyVoDPuO2daOzdL41cBPZ9RLgYMlyHMI5J8",
            "sha512-YtFNdrgPXQZYTGn/ZyvuLNyF0V8RlgAe+ksK/4H0BQsfb8edYhHaIfoeku9XiMSudFysLVaYtS0v9QoXcQF1"
        ]
    },
    {
        "url": "https://archive.softwareheritage.org/api/1/content/sha256:05b41ea4707614b20c1f55b47979989ee10ed62dd17322afab1451750df8119e/raw/",
        "timestamp": 1600004896,
        "status": 200,
        "type": "application/octet-stream",
        "length": 148950,
        "hashes": [
            "md5-gJ3+b/E4oafUbLVqv9Tr1g==",
            "sha1-hI/McOvVuFrfOUlFFVXfls+6Nj0=",
            "sha256-BbQepHB2FLIMH1W0eXmYnuEO1i3RcyKvqxRRdQ34EZ4=",
            "sha384-SW7y3hs2meaK3CbXurLKLDLN6N99wSyVoDPuO2daOzdL41cBPZ9RLgYMlyHMI5J8",
            "sha512-YtFNdrgPXQZYTGn/ZyvuLNyF0V8RlgAe+ksK/4H0BQsfb8edYhHaIfoeku9XiMSudFysLVaYtS0v9QoXcQF1"
        ]
    }
]

A javascript widget would have a function, very much like the contentid R package by @cboettig :

var location = resolve("hash://sha256/05b41ea4707614b20c1f");
var img = document.createElement("img");
img.setAttribute("src", location);
// append img element somewhere

Some smart internal logic would pick the most suitable location.

This is the most basic solution that I can come up with now.

Curious to hear your thoughts, -jorrit

mjy commented 3 years ago

Thanks @jhpoelen, will do some experimenting given this, very helpful.