facebook / Rapid

The OpenStreetMap editor driven by open data, AI, and supercharged features
https://rapideditor.org
ISC License
515 stars 91 forks source link

Duplicate data from MS Buildings layer in some places #873

Closed Bonkles closed 1 year ago

Bonkles commented 1 year ago

Description

As reported by user zbrown on the osm US slack, it looks like we're getting duplicated buildings in some areas in Rapid.

This behavior seems to occur in both rapid v1 and in rapid v2 beta.

We will need to break down the investigation into several parts: 1) Are we receiving duplicate way IDs from the back-end? Or are we receiving different way IDs with the same node data? 2) Are we introducing the duplicates as part of the map with AI backend service, or are we receiving the duplicates from the microsoft back-end?

Version

1.1.x, 2.0.0.b0

What browser are you seeing the problem on? What version are you running?

Chrome 107

The OS you're using

MacOS

Steps to reproduce

navigate to the spot indicated in the URL, you can observe that the microsoft buildings are addable twice per building, as opposed to once.

The browser URL at the time you encountered the bug

https://mapwith.ai/rapid#background=Bing&datasets=fbRoads,msBuildings&disable_features=boundaries&map=20.64/45.08908/-93.49389

The auto-detected useragent string for your browser (leave blank if you're manually filling this form out)

No response

tsmock commented 1 year ago

I'll go ahead and add what I put in chat a few weeks ago:

How are you guys handling duplicate data? Specifically, with https://mapwith.ai/maps/ml_roads?bbox=-95.7946849,39.030297,-95.7886398,39.0346355&result_type=road_building_vector_xml&conflate_with_osm=true&theme=ml_road_vector&collaborator=josm&token=ASb3N5o9HbX8QWn8G_NtHIRQaYv3nuG2r7_f3vnGld3KhZNCxg57IsaQyssIaEw5rfRNsPpMwg4TsnrSJtIJms5m&hash=ASawRla3rBcwEjY4HIY I am getting duplicated data.

Look for 39.033121373812 (node id=-4222309733446286915). There are two duplicate entries with it, and more importantly, there are two buildings with the same node ref list but with different ids.

Playing around with https://mapwith.ai/rapid#map=19.93/39.03307/-95.79104 shows that there is (minimally) a UI issue, and it is possible to add both of the buildings.

So, to answer the questions in the ticket:

  1. Yes, we are receiving duplicate data from the backend
  2. I don't know
Bonkles commented 1 year ago

Okay, in that case I will dig in on #2.

tsmock commented 1 year ago

I'm going to guess that we [are] receiving the duplicates from the microsoft back-end since we have two ways with separate ids but the same node ids.

Bonkles commented 1 year ago

The plot thickens.

It appears as though we're getting duplicate building responses from the MS API.

For example, I pinned the area reported to a very small WKT polygon that includes a single building.

That building can be seen in Rapid at: https://mapwith.ai/rapid#background=Bing&datasets=fbRoads,msBuildings&disable_features=boundaries&id=n122364296&map=20.95/39.03318/-95.79104

API Call:

GET https://buildingfootprintsapi.azurewebsites.net/api/BuildingFootprints?searchAreaWkt=POLYGON((-95.79145279552904 39.03393739690898,-95.79155120823998 39.033058893163684,-95.79056509173824 39.03296520328675,-95.7906243020261 39.03388511232822,-95.79145279552904 39.03393739690898))

Response:

HTTP/1.1 200 OK
Connection: close
Content-Type: application/json; charset=us-ascii
Date: Thu, 30 Mar 2023 14:47:56 GMT
Server: Microsoft-IIS/10.0
Access-Control-Expose-Headers: Request-Context
Cache-Control: no-cache
Content-Encoding: gzip
Expires: -1
Pragma: no-cache
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-AspNet-Version: 4.0.30319
Request-Context: appId=cid-v1:008eba93-0f1f-4350-94f9-6625987555dd
X-Powered-By: ASP.NET

[
  "POLYGON ((-95.791202610959857 39.033121373811596, -95.790961706892517 39.033112886954363, -95.790959450837533 39.0331515873981, -95.790900562990387 39.033149516028921, -95.790897290340112 39.033205655147391, -95.790946699846032 39.033207393116292, -95.790942086789414 39.033286403400091, -95.791069347929309 39.033290879778512, -95.791075408392814 39.0331869185382, -95.791198537527762 39.0331912495805, -95.791202610959857 39.033121373811596))",
  "POLYGON ((-95.791202610959857 39.033121373811596, -95.790961706892517 39.033112886954363, -95.790959450837533 39.0331515873981, -95.790900562990387 39.033149516028921, -95.790897290340112 39.033205655147391, -95.790946699846032 39.033207393116292, -95.790942086789414 39.033286403400083, -95.791069347929309 39.033290879778512, -95.791075408392814 39.0331869185382, -95.791198537527762 39.0331912495805, -95.791202610959857 39.033121373811596))"
]

So it would seem we are being fed two buildings for the price of one here. I will take this up with microsoft and see what they say.

Bonkles commented 1 year ago

Idea for a patch on our (client) end of things: is it sufficient to simply discard duplicate buildings by examining their start nodes, and if equal, deduplicate? I will include this idea in an email to the MS Api maintainers.

Bonkles commented 1 year ago

Email sent, awaiting response.

tsmock commented 1 year ago

Technically very doable -- I still have a cleanup method in the MWAI plugin which I could repurpose for this.

In JOSM, what I would probably do is something like the following:

if (node.isReferredByWays(2)) {
    Map<Way, long[]> wayMap = new HashMap<>(2); // Profile -- this might need to have a different initial size
    for (Way parent : Utils.filteredCollection(node.getReferrers(), Way.class)) {
        long[] ids = parent.getNodeIds().stream().mapToLong(l -> l).sorted().toArray();
        wayMap.put(parent, ids);
    }
    Set<Way> toDelete = new HashSet<>();
    for (Map.Entry<Way, long[]> current : wayMap.entrySet()) {
        for (Map.Entry<Way, long[]> toCompare : wayMap.entrySet()) {
            if (current == toCompare || toDelete.contains(toCompare.getKey())) continue;
            if (Arrays.equals(current.getValue(), toCompare.getValue()) toDelete.add(toCompare.getKey());
        }
    }
    // Perform deletion
}

We might want to test that the keys are the same as well, but that might not be necessary for the MS building data -- it might be more of a problem if we have duplicates from Esri.

Bonkles commented 1 year ago

MS has confirmed receipt of my issue, they are investigating.