Create production-ready ELI file

OpenHistoricalMap / issues

File your issues here, regardless of repo until we get all our repos squared away; we don't want to miss anything.

Creative Commons Zero v1.0 Universal

19 stars 1 forks source link

Create production-ready ELI file #296

Open jeffreyameyer opened 3 years ago

jeffreyameyer commented 3 years ago

What's your idea for a cool feature that would help you use OHM better.

In order to make it easier to trace old maps, it'd be great if OHM's iD layer selector pointed at a list of historical x/y/z servers.

We have such an ELI set up (https://openhistoricalmap.github.io/ohm-editor-layer-index/imagery.xml), but we do not a) have OHM iD pointing at it, and b) the list isn't ready to be pointed at.

To make (b) happen, we need to merge: a) the existing OHM ELI b) appropriate layers from the standard OSM ELI c) any other cool x/y/z servers

So... here are the instructions for completing this task: https://gist.github.com/Chaser324/ce0505fbed06b947d962

Current workarounds Using the "custom" layer option in iD, which can be a pain and only allows use of 1 historical layer at a time

andrewharvey commented 3 years ago

I setup https://github.com/OpenHistoricalMap/ohm-editor-layer-index so we can store imagery layers which aren't useful or usable in OSM so can't go directly in OSM's editor-layer-index.

As for how we should deal with layers in OSM's editor-layer-index but still useful and usable (wrt licensing) by OHM, we could either:

Attempt to migrate layers across from OSM's ELI to OHM's ELI directly in this git repo.
Setup the build process to pull and merge OSM's layers into our output file which iD and JOSM consume, but don't commit those sources here in the source code.

I think (2) is best, that way we can focus on maintaining only OHM specifc layers here and the rest just come in automatically without cluttering our repo. We might need to maintain a whitelist/blacklist of layers we don't bring across (eg Bing?).

jeffreyameyer commented 3 years ago

Interesting... I was starting out with (1), thinking that we could just provide some common worldwide-coverage reference layers (Bing : ) - which we have permission to use), OSM, etc., and then to retain only historical for local layers. We could always add specific global/regional reference layers as requested.

That said (2) seems pretty appealing, as long as there's some way of prioritizing historical layers higher in the resulting iD exposure / sort order. Is that possible? Would the instructions for that build process live in this repo?

andrewharvey commented 3 years ago

(1) would certainly make it harder to see what layers differ between the two indexes and is a lot of work to maintain. Every single update to OSM's index you need to replicate manually here. Not something I'd be volunteering to do.

(2) we need to think about what should be prioritized, because in OSM the default layer shown at a location is based on the one marked "best" which is usually down to higher resolution and more current. In OHM I don't really see one particular imagery or historical map being "best" because usually you'd flip through different times when you're mapping.

If we implemented (2) then yes I would see us adding some code that when this index is built, it fetches OSM's imagery.json output and merges it.

jeffreyameyer commented 3 years ago

I'd actually see (1) as a 1-time sync to get the obvious candidates / usual suspects, and then only do individual adds against OHM. Of course, that's because I'm not imaging many high-value layers getting added to OSM in the future, or that there would be relatively few of them.

Agreed on not wanting to assess a best old map, but I do think there's a lot of value in having a "best" local current ground truth, as old maps are... well... inaccurate. A typical workflow is to switch back and forth rapidly between the old map and a current overhead imagery layer to get a sense of where the old map was "talking about" if not representing with high spatial fidelity.

andrewharvey commented 3 years ago

If we are just talking about a handful of global layers then I think that's okay with option 1, they don't change much.

If we did the full OSM index, as a one time sync it would get out of date, endpoints change etc so dumping it in without the maintenance is not an option in my opinion. Either we decide to maintain it, or we go with option 2 and have OSM layers sync through automatically.

So I think option 2 is the best way forward, the downside is it needs some code to merge OSMs index into ours and some thought about which layers from OSM we can't use for OHM.

jeffreyameyer commented 3 years ago

I agree about the downside... based on experience with other OHM appropriations of OSM tools, we've (I'll let @danrademacher, @geohacker, and @geohacker weigh in here...), options that involve the lightest (or no) code modifications have ended up being the easiest to maintain (sorry for the totally obvious claim, but dev resources for OHM are spread very thin).

I think if we did option 1 with a very healthy pruning approach - e.g. maybe stripping out everything but:

Handful of global layers (e.g., Bing, OSM, a few O[x]Ms, at least 1 topo/terrain)
Any historic layers (e.g., USGS, the AU layers you provided, NLS, etc.) already in there
Any '"best": true' layers

If the endpoints changed, etc... that seems like the type of minor maintenance that less-qualified types [e.g., me] could help with, as those are just config file changes, no?

We could then build up from there manually and separately from OSM.

danrademacher commented 3 years ago

I'm finally catching up here.

Regarding Option 2 and the idea of merging in upstream OSM layers as they change, or are added/deleted:

I could see the value of this, given that there are often super-local sources and who's to say they wouldn't be useful for OHM? Like this Oakland, MI, source just PRed over in the main repo. For ground-truthing historical layers, such layers could be handy.
BUT there are 895 files of type geojson in the sources directory of the main OSM ELI repo. Would each of those need to be reviewed for license compatibility and only then become part of some build process?

Perhaps the safe-listing for license could itself be automated? That seems tough. Of the 895 geojson files, 729 have license_url attributes that link to some 258 unique license sources. So that would be a lot to sort through. I wondered about permission_osm -- Maybe implicit OSM is more likely to fine for OHM as well? 194 have the permission_osm attribute, and of those 22 are implicit and 172 are explicit, so I think one would still have to review 258 licenses to determine compatibility.

Given all that, this feels like it's either manually lining up licenses to allow for an automated merge (with manual requirement to periodically review licenses for new layers), or taking Option 1 approach and building up from scratch for a narrower set of layers from OSM to meet core mapping needs, and then focusing on historic layers from there.

andrewharvey commented 3 years ago

Perhaps the safe-listing for license could itself be automated? That seems tough. Of the 895 geojson files, 729 have license_url attributes that link to some 258 unique license sources. So that would be a lot to sort through. I wondered about permission_osm -- Maybe implicit OSM is more likely to fine for OHM as well? 194 have the permission_osm attribute, and of those 22 are implicit and 172 are explicit, so I think one would still have to review 258 licenses to determine compatibility.

OSM's ELI is a bit of a mess in regards to how it tracks licensing, the license_url field is a mishmash of values, what it really should have is fields for,

source license as SPDX identifier https://spdx.org/licenses/
url of where the publisher lists their copyright statement
a link a special waiver just for OSM (if needed)

That would make it much easier for OHM to sort through, since almost anything with an SPDX identifier would probably be okay, and we could filter out those that don't have an open source license but instead rely on a waiver.

We could propose this for OSM ELI to make work here easier.

Given all that, this feels like it's either manually lining up licenses to allow for an automated merge (with manual requirement to periodically review licenses for new layers), or taking Option 1 approach and building up from scratch for a narrower set of layers from OSM to meet core mapping needs, and then focusing on historic layers from there.

We can. I would prefer to avoid having each and every OSM layer being duplicated manually, so we'd need some inclusion criteria to limit to only historical + some global defaults.

jeffreyameyer commented 3 years ago

We can. I would prefer to avoid having each and every OSM layer being duplicated manually, so we'd need some inclusion criteria to limit to only historical + some global defaults.

I actually think we'd have a limited # of layers to duplicate and then wait for users to update / request.

We discussed this issue at a team meeting & I think there's a strong desire to do the 1-time fork and then maintain separate layers approach. I have a feeling we'll have our hands full with maintaining historical sources without adding on the OSM sync. I know you've been encouraging other approaches, @andrewharvey, so I hope this isn't a bad path to pursue. It's certainly one we're (naïvely) aware of the pain this might incur.

On a separate note, ref license statements, privacy policies, and lack thereof, would it make sense to modify the overall schema to include a flag like '"privacy_policy_exists": "true", "false" so that the checkers don't error when one isn't listed, but it's not listed b/c it doesn't exist? (I can make the change & am willing to update / fix said large volume of ELI json files...)

andrewharvey commented 3 years ago

would it make sense to modify the overall schema to include a flag like '"privacy_policy_exists": "true", "false" so that the checkers don't error when one isn't listed, but it's not listed b/c it doesn't exist?

We should just define privacy_policy: false means there is no privacy policy, otherwise the value should be a URL. Then you can distinguish this without a new field. I think we should try to upstream this so we don't end up with too much divergence.

We discussed this issue at a team meeting & I think there's a strong desire to do the 1-time fork and then maintain separate layers approach. I have a feeling we'll have our hands full with maintaining historical sources without adding on the OSM sync. I know you've been encouraging other approaches, @andrewharvey, so I hope this isn't a bad path to pursue. It's certainly one we're (naïvely) aware of the pain this might incur.

That's fine, happy for you to take whichever direction you need to make this work better for you. Feel free to request a review from myself on any PRs if you want a second set of eyes.