OpenHistoricalMap / issues

File your issues here, regardless of repo until we get all our repos squared away; we don't want to miss anything.
Creative Commons Zero v1.0 Universal
19 stars 1 forks source link

HTTP 502 error loading data in iD in San José #588

Closed 1ec5 closed 1 year ago

1ec5 commented 1 year ago

If you open iD to a z20 bbox that encompasses this post office in San José, California, iD kicks off an API request for all the relevant data in this bbox:

https://www.openhistoricalmap.org/api/0.6/map.json?bbox=-121.89331054687499,37.33522435930639,-121.8878173828125,37.339591851359174

This request is consistently failing with an HTTP 502 error. Loading this URL directly in the browser returns this error:

Incomplete response received from application

It is kind of possible to edit in this area, but iD refuses to let you perform basic tasks such as squaring a building. These operations trigger a warning that not enough of the feature has been downloaded – even though it’s a brand-new feature that hasn’t been saved yet. For example, the only way to delete a new node is to undo; the delete operation is disabled.

batpad commented 1 year ago

@Rub21 could you test this and perhaps we need to increase the resources per replica for the API?

1ec5 commented 1 year ago

It is kind of possible to edit in this area, but iD refuses to let you perform basic tasks such as squaring a building.

What’s more, it’s only possible to add new features. Once you save a feature, it’s impossible to edit or delete it, because iD is unable to load it.

1ec5 commented 1 year ago

As a limited workaround, I’ve been manually entering the element ID of the node, way, or relation that I want to edit. It feels sort of like editing in level0 (#472). Unfortunately it would be a significant barrier to entry for less experienced mappers in my area.

Rub21 commented 1 year ago

I have never encountered this type of issue before. However, it appears that the error may be related to the volume of data. What's puzzling is that the problem persists even when dealing with a very small area.

First Download

The GIF demonstrates that data can be downloaded for a relatively large area during the first attempt. However, in subsequent tries, data retrieval fails even when the area in question is smaller.

Second Download

OpenHistoricalMap API Request

I've tried increasing the resources on the website, but the issue persists. After reviewing the changesets in the area, I came across a suspicious one: OpenHistoricalMap Changeset 82103.

Suspicious Changeset

There is a significant volume of data, and according to the comments, it seems to have been uploaded from a shapefile. Uploading data to the API can be done in chunks, allowing for large datasets to be incrementally added. However, the same approach doesn't work for data retrieval.

Current API settings:

# Maximum number of nodes that will be returned by the API in a map request
max_number_of_nodes: 50000
# Maximum number of nodes that can be in a way (checked on save)
max_number_of_way_nodes: 2000
# Maximum number of members that can be in a relation (checked on save)
max_number_of_relation_members: 32000

For local testing, I increased these values to:

# Maximum number of nodes that will be returned by the API in a map request
max_number_of_nodes: 20000000
# Maximum number of nodes that can be in a way (checked on save)
max_number_of_way_nodes: 800000
# Maximum number of members that can be in a relation (checked on save)
max_number_of_relation_members: 12200000

As a result, the memory usage of the production database increased to 8.8GB, but the task has not yet been completed.

Memory Usage

Additionally, the replication files show a lot of changes recorded on the dates of July 20-21, 2023.

Replication Files More Replication Files

These changes occur in the same area of San Jose.

San Jose Area

@batpad @1ec5 I believe the only viable solution is to increase the various settings values, deploy these changes to production, and then proceed with cleaning the datasets. Do you have any alternative suggestions?

1ec5 commented 1 year ago

If max_number_of_nodes affects the number of nodes that can be returned in one API response, it might be the reason why the original request I posted is failing. That z20 bbox contains only 493 nodes, according to this Overpass query. However, the map.json endpoint also recursively returns any relation that contains these nodes.

The various San José place POIs within the bbox will end up including almost all of the relations in this chronology relation, such as this node that belongs to 698 boundary relations. Each of those relations in turn contains many member ways that are also included, bringing the total node count to 58,824, which is exceeds the current max_number_of_nodes setting.

This dataset was imported in codeforsanjose/OSM-SouthBay#28. I think the modeling was as correct as we could come up with. It would be unfortunate if we had to simplify the dataset for performance reasons. The only reasonable simplification I can think of would be to dissociate the label nodes from the boundary relations. While incorrect from a data modeling standpoint, it would prevent the API from including all the boundary relations. Selfishly, I’d prefer it if we could permanently increase the node limit to something that can accommodate the San José boundary data, but I realize it increases the risk of service disruption.

When we worked on the import, we noticed that OHM’s instance of the OSM API had a very hard time accepting more than a few new relations at a time – much less than the osm.org instance. I figured it was probably bottlenecked on the relation constraint validation that the API does at upload time, but fortunately the local community doesn’t expect to import something quite that intense again soon. We certainly won’t need a limit of 20,000,000 nodes in the near feature.

Edit: Actually map.json does recursively include all the members of all the relations.

mmd-osm commented 1 year ago

https://lists.openstreetmap.org/pipermail/rails-dev/2023-August/026977.html is the reason why uploads and downloads are much slower compared to osm.org.

batpad commented 1 year ago

@mmd-osm thank you.

We do need to prioritize moving to cgimap - @Rub21 @geohacker let's catch up on what this would involve - I imagine it shouldn't be too crazy.

mmd-osm commented 1 year ago

Since osm-seed includes CGImap, you're well familiar with the details already. Maybe double check that you're on the current 0.8.8 release, and that the Apache rewriting rules cover all required endpoints. The OSM chef repo is usually a good starting point to see how we use CGImap. Please let me know in case you're facing some issues here.

Rub21 commented 1 year ago

@1ec5 @batpad The issues is been fixed, San Jose is loading in iD , The version latest version of cgimap is running in the same container with Rails ports.

image

The URL: https://www.openhistoricalmap.org/api/0.6/map.json?bbox=-121.89331054687499,37.33522435930639,-121.8878173828125,37.339591851359174 is downloading correctly.

1ec5 commented 1 year ago

Yessss! Thank you, this is so fast!

mmd-osm commented 1 year ago

We're collecting CGImap customer quotes over at the Wiki: https://wiki.openstreetmap.org/wiki/CGImap#Customer_quotes

Feel free to add something, if you like ... :)

batpad commented 1 year ago

@mmd-osm thank you again for chiming in here :-)

I have not edited the OSM wiki in a very very long time. @1ec5 if you feel upto it, I think it would be great to add a quote there about how CGIMap has helped HUGELY improve performance issues on Open Historical Map, or I can get to it when I have a bit more bandwidth :-)

@Rub21 - having CGIMap running on the same container is a great first step. Eventually, I'd really like to explore running CGIMap in its own container, so we can scale the CGIMap process separately from the web containers. To do that we'd have to, roughly:

Run a separate container that only runs CGIMap - I'm not sure if we'd expose lighttpd directly, or run it behind nginx or Apache with reverse proxy.

Then, we'd need to do the URL Routing at the Kubernetes Ingress layer, to route the /api/0.6. URLs that we are currently rewriting using Apache RewriteRules and do that a layer above at the Ingress layer to route requests to the appropriate container - web or CGIMap, based on the URL.

Practically, I don't think this is hugely important to do, so let's not prioritize, but architecturally, it seems like it would be neat to have CGIMap as a separate container rather than proxying to a process on the same container. @Rub21 - let's discuss and if this seems useful, we can create a separate ticket for that.

Thanks again @Rub21 for the amazing work here, and of course thank you to everyone involved in CGIMap :-)

1ec5 commented 1 year ago

We're collecting CGImap customer quotes over at the Wiki: https://wiki.openstreetmap.org/wiki/CGImap#Customer_quotes

Feel free to add something, if you like ... :)

I’ll add it here so I can cite myself. 😛

CGImap has dramatically improved my experience as a contributor to OpenHistoricalMap. Before, I could barely upload boundary relations without JOSM timing out and retrying, and using iD sometimes felt like reverting to a dialup connection. Now I can upload and download without going out for a walk while I wait.