OpenHistoricalMap / issues

File your issues here, regardless of repo until we get all our repos squared away; we don't want to miss anything.
Creative Commons Zero v1.0 Universal
19 stars 1 forks source link

Tagging tag changes over time #488

Closed Dimitar5555 closed 1 year ago

Dimitar5555 commented 1 year ago

Currently adding different names of a simple polygon requires the use of multiple relations with overlapping tags which looks ugly in the editor and is hard to edit and maintain. For one such example see: https://openhistoricalmap.org/way/198851511. Adding different tags for linear features require creating a dozen duplicating ways which is even harder to maintain or requires using relations which can get messy quite quickly.

I've suggested before to use a format like this: key=start_date_1;end_date_1;value_1^start_date_2;end_date_2;value_2

In cases where one of the dates is unknown, it can be left blank, the semicolon should be required to know which date is missing. key=start_date_1;;value key=;end_date_1;value

The idea is to make it as easy as possible for data consumers (and editors) to get the required data. Running value.split("^"); will give all values with their start/end dates. Running .split(";", 3) will return an array which would look like this [start_date, end_date, value]. In cases where the start or the end date is missing, the array would have an empty string. *

Note: there must be at least one "special" symbol which can't be used in any value.

Positive sides:

Negative sides:

I would love to hear what you think about such schema and any possible issues which you might have noticed that I've missed.

*The code examples assume that the parser is using Java. That method/function may have other names in other languages and may behave differently. It is possible to write a simple method/function that does it in almost all (if not all) programming languages.

1ec5 commented 1 year ago

I've suggested before to use a format like this:

Previously in https://github.com/OpenHistoricalMap/issues/issues/284#issuecomment-1024885329, for those who weren’t following that discussion.

Off the top of my head, here are some other downsides with embedding temporal changes inline inside a tag value:

There may be other issues that wouldn’t be apparent unless we start implementing an approach along these lines. But I think the prospect of having to rewrite large parts of iD undermines the justification for this proposed change, which seems to be focused on the difficulty of selecting overlapping objects. I think it would be much more straightforward to improve the usability of overlapping objects within iD and JOSM. That would benefit OSM as well as OpenHistoricalMap.

Dimitar5555 commented 1 year ago

A feature needs to be duplicated anyways if the geometry ever changes at all. Inexperienced mappers may find it counterintuitive that a feature needs to be duplicated due to some changes but not due to other changes.

By duplication I meant using the same nodes (i.e. having two or more lines which share the same nodes)

We’d need to rewrite much of iD’s UI to accommodate changes over time in any field that isn’t a freeform text field. This is even before considering preset-level changes, like place=village becoming place=town and eventually place=city.

In theory it could be done by creating a new field type (although it may be harder to code than expected).

The = filter in OverpassQL would become much less useful. It would only remain useful for checking whether an attribute has remained constant throughout a feature’s lifetime. (With multiple features, a chronology relation can help answer this question.)

There is ~ for that purpose. Further filtering will be required if the data user wants data from specific period.

The proposed syntax makes it much easier for a tag value to exceed 255 characters.

It's already noted in the first comment. A possible workaround would be to use key_1, key_2 etc. If key has length of 255 characters, the software should look for such key. It's not the cleanest solution and it will have a few problems (like how do you decide when to start using a new key) but it should work unless you are looking for a specific value which is in key_1.

Another workaround is to have all values in the main tag and have a separate tag for the dates (this also partly solves the previous issue). For example name=name1;name2;name3;name4 name:start_dates=1785;1790;1850;1944 name:end_dates=1790;1850;1944;

The proposed parsing rule needs to account for individual values that contain semicolons. There’s an obscure ;; syntax for escaping semicolons in multivalue lists.

For context, it is possible to split a string by specified character and have a limit on the number of resulting strings (or a limit on the number of splits. The specific implementation depends on the language). That way one string can be split only two times and everything which is after the third semicolon (the value of the key) will remain as it is regardless of how many semicolons it has therefore making escaping semicolons redundant.

There may be other issues that wouldn’t be apparent unless we start implementing an approach along these lines. But I think the prospect of having to rewrite large parts of iD undermines the justification for this proposed change, which seems to be focused on the difficulty of selecting overlapping objects. I think it would be much more straightforward to improve the usability of overlapping objects within iD and JOSM. That would benefit OSM as well as OpenHistoricalMap.

There are always underwater stones which you can't see but you will definitely hit while swimming. The goal of this issue is to create a reasonable solution before the database gets a few million dates, a few hundred duplicated ways and becomes hard to migrate to a new format.

1ec5 commented 1 year ago

We’d need to rewrite much of iD’s UI to accommodate changes over time in any field that isn’t a freeform text field. This is even before considering preset-level changes, like place=village becoming place=town and eventually place=city.

In theory it could be done by creating a new field type (although it may be harder to code than expected).

Realistically speaking, iD isn’t going to be able to support this format for the foreseeable future, unless someone steps up to implement it. I’ve implemented several complex field types myself, but looking at the history of issues like openstreetmap/iD#974 and openstreetmap/iD#6168, I’m not optimistic about being able to write a time-qualified, multivalue field variation of every existing field type. I think there would be a similar level of effort even with the older, simpler proposal for putting date ranges in subkeys.

batpad commented 1 year ago

cc @rwelty1889 since he has done a LOT of thinking about this problem. I think the best issue for this discussion is still likely https://github.com/OpenHistoricalMap/issues/issues/284 - if there are no objections, I would like to close this issue in favour of that to keep discussions around this topic in one place.

Broadly, I do agree that we should find a better solution than "redraw the feature" for every tag change. But there are many complexities: from iD, to the vector tile renderer, to the frontend data filtering logic, to deal with. I think we've made good progress in thinking about mapping these with relations over in #284 and we should continue discussion there.

Dimitar5555 commented 1 year ago

Closing this issue in favour of #284.