OpenHistoricalMap / issues

File your issues here, regardless of repo until we get all our repos squared away; we don't want to miss anything.
Creative Commons Zero v1.0 Universal
17 stars 1 forks source link

Inspector should display Wikimedia Commons images #581

Open 1ec5 opened 10 months ago

1ec5 commented 10 months ago

The wikimedia_commons key can be set to either a category name (prefixed with Category:) or an image file name (File:). If it’s a file name, the inspector should display the image, just as it does when image is present on the selected feature.

Examples:

jeffreyameyer commented 10 months ago

@danrademacher - I know we've been loathe to expand the inspector too much in the past, but this seems like something we could plug into what we've built so far, no?

1ec5 commented 10 months ago

This would be a more durable alternative to tagging image with a raw Commons image URL, such as on Black Rock City 2008. The image URL can go away if someone uploads a new revision of the file – that revision will wind up at a slightly different URL. It’s safer to consult the Wikimedia Commons API for the canonical file URL.

Here’s where we currently read image and the various undocumented image:* tags:

https://github.com/OpenHistoricalMap/ohm-inspector/blob/94e546bc8ad4b05a09ad052cffff0cec1a8c8217/openhistoricalmap-inspector.js#L117

We’re already using the MediaWiki API to get excerpts about Wikipedia articles. Wikimedia Commons supports the MediaWiki API; we could use it to get the same metadata about an image that Wikipedia displays in a lightbox when you click to preview it. For example, given the tag wikimedia_commons=File:Bread of Life 2331 Jackson Ave New Orleans.jpg, we’d make a request for this query (open in API playground) to get:

The API supports querying for metadata about multiple images at once.

There’s also a way to get the data in more structured form if we’re interested.

danrademacher commented 10 months ago

Looking back, I think we just coded in those image URLs based on what we had in the tags on the sample data, rather than with a more long-term view of how we'd deal with link rot over time.

Getting everything we need from a single tag, wikimedia_commons=File:Bread of Life 2331 Jackson Ave New Orleans.jpg is a heck of a lot better than having multiple interdependent tags for each image.

@jeffreyameyer I seem to recall that some of your example image tags used non Wikimedia URLs as src values for the image itself (eg, like an image from a blog). With this new approach, we'd settle on all inspector images coming form Wikimedia Commons, correct?

I think that's a good idea since link rot (not to mention license issues) will be an even bigger issue if folks can continue to grab image URLs from anywhere on the internet.

1ec5 commented 10 months ago

Automatically grabbing and displaying image URLs from anywhere on the Internet will inevitably lead to broken images, inappropriate images replacing the ones the mapper intended, privacy-invasive trackers on squatted sites, and even perhaps a security hole, since we apparently aren’t checking file types or even file extensions. At the very least, there should be a domain whitelist. It’s fine if someone needs to tag an image on a domain that isn’t whitelisting, for the purpose of citing sources, but we don’t have to showcase it in the inspector if so.

danrademacher commented 10 months ago

I agree we should be at least checking to be sure that putative image URLs return images. Posted that as #583 to address quickly.

I would prefer to just say all Inspector images come from Wikicommons and move on from there, assuming then that if folks post inappropriate content to Wikicommons and cite it at OHM then at least we have multiple angles where community correction is likely.

Here's an example of a way with photos from 2 different sources: https://staging.openhistoricalmap.org/way/198291609#map=20/47.59899/-122.33458&layers=O&date=1923-01-01&daterange=1923-01-01,2023-12-31

https://pcad.lib.washington.edu/ https://cdm16118.contentdm.oclc.org/

These seem like they're likely well-known in Seattle but little known beyond that, though oclc.org has a bigger footprint as a library software provider.

What mechanism might we use to maintain a safe list of domains eligible for inclusion in the inspector?

1ec5 commented 10 months ago

What mechanism might we use to maintain a safe list of domains eligible for inclusion in the inspector?

There’s a long tail of image values, but at a glance, even whitelisting a single domain – collections.mcny.org – would cover a third of the occurrences of this key, and wikipedia.org/wikimedia.org would cover another third. That’s small enough for a simple regular expression. But I think this should be tracked in #583 or somewhere else #585. It’s independent of honoring the wikimedia_commons key, which is good for consistency with OSM too.

jeffreyameyer commented 10 months ago

Minh:

Automatically grabbing and displaying image URLs from anywhere on the Internet will inevitably lead to broken images, inappropriate images replacing the ones the mapper intended, privacy-invasive trackers on squatted sites, and even perhaps a security hole, since we apparently aren’t checking file types or even file extensions. At the very least, there should be a domain whitelist. It’s fine if someone needs to tag an image on a domain that isn’t whitelisting, for the purpose of citing sources, but we don’t have to showcase it in the inspector if so.

Dan:

I would prefer to just say all Inspector images come from Wikicommons and move on from there, assuming then that if folks post inappropriate content to Wikicommons and cite it at OHM then at least we have multiple angles where community correction is likely.

Photo linking is something we have, but haven't pushed too far on and I think the points about dead links and malicious files are worth planning around.

And, having first-rate support for Wikimedia Commons photo integration makes complete sense.

However, I think there are way too many other sources of historical photos that we would want to limit it to just Wikimedia Commons. I think the depth, quality, and most importantly - recency - of photos there are too limiting. And, it's sort of counter to the whole GLAM / Linked Data gestalt.

For example, here are some photo repositories from up in Seattle that would be fantastic to have people using: Museum of History and Industry - hosted at UW Seattle Public Library King County Historical Photos Seattle Then & Now Ballard Historical Society BA-KGROUND - another hobbyist site

There are other sites hosting pre-geolocated pictures we could leverage: From friend of OHM, Dan Vanderkam:

Then, of course, there's the Library of Congress & things like its Historic American Buildings Survey..

And... there are cases of unusual photographs, not posted on a hosting site, not found with reverse image search, that are still of interest. See the picture for Segedunum at the end of Hadrian's Wall. (url shortened... https://bit.ly/pl_89288... probably a very bad thing, but intended to save bits on an Internet Archive URL).

We could go on here, but I think it's clear that the vast majority of interesting historical photos available on the Internet are outside of Wikimedia, even though that's contrary to what taginfo tells us right now. Hard-limiting image tagging to just what's on Wikimedia really restricts the creativity of mappers who want to put in the work. It will also limit our ability to demo stuff to potential picture partners. And... it's solving a problem I'm not sure we have quite yet.

1ec5 commented 10 months ago

Let’s please keep this ticket on topic, about showing Commons images. Any discussion about not showing an image should go in #583 or another issue #585.

1ec5 commented 10 months ago

Commons has a system of keeping track of images used on OSM and the OSM Wiki so that administrators know to keep a redirect around when renaming an image and avoid breaking images on OSM without a good reason. If we migrate Commons-hosted images to the wikimedia_commons tag, this system can easily be extended to OHM.

1ec5 commented 10 months ago

I spun out #585 to discuss the whitelist proposal specifically.