facebookmicrosites / Open-Mapping-At-Facebook

Documentation for Open Mapping At Facebook
MIT License
179 stars 30 forks source link

add the source tag to changesets, not every object #15

Open camelCaseSucks opened 4 years ago

camelCaseSucks commented 4 years ago

There are over 1.6 million buildings tagged with source=microsoft/BuildingFootprints. This pollutes the database with information that is completely useless, since anyone with satellite imagery can see there are buildings. The changeset can and should be tagged with the source, but not every object added from the dataset.

jeffdefacto commented 4 years ago

Hi @camelCaseSucks

Tagging has been an internal debate for some time and we go back and forth on if the tags should be included. @gaoxm thoroughly explained our current reasoning behind including the tags on all elements on the Slack channel recently which I've copied below.

"Currently we keep the source tag on individual elements for two main reasons: (1) Historically when we first started AI-assisted mapping for roads in Thailand, some OSM editors from the Thailand community asked us to add the source tag to every road that was originally generated by AI, so that they could do tracking on data quality on finer granularity. E.g. if within a changeset, an editor from the FB mapping team did 3 things at the same time (i) add an AI-generated road (with manual fix on top) (ii) connect an AI-generated road to a pre-existing road, which means the pre-existing road is edited (iii) edited another pre-existing road which was originally generated by AI, but added to OSM in an earlier changeset. Then in this case, someone from the community would still be able to tell the roads for (i) and (iii) were originally created by AI, but the one for (ii) was not.

(2) One changeset committed through RapiD could contain AI-generated data from different sources. E.g. currently, it may contain roads from FB and buildings from MS. But moving into the future, we may have multiple sources for the same type of data, e.g. some buildings from FB, and some buildings from MS in one changeset. In this case, if the community reports issues on individual elements like https://github.com/facebookincubator/RapiD/issues/131, we'll be able to track which original dataset could be buggy and do corrections on that dataset at scale. If the source tag is only on the changeset, such tracking capability will be lost.

With the above points being said, we are constantly listening to the community's feedback and re-evaluate if we should change this at a certain time point. If people get less and less concerned on the two points above, we may well decide to drop the element-level source tags, so as to reduce the amount of metadata that brings more confusion than value. So thanks again for continuously sharing your thoughts and feedback and we appreciate all of them!"