earthobservations / luftdatenpumpe

Acquire and process live and historical air quality data without efforts. Filter by station-id, sensor-id and sensor-type, apply reverse geocoding, store into time-series and RDBMS databases, publish to MQTT, output as JSON, or visualize in Grafana. Data sources: Sensor.Community (luftdaten.info), IRCELINE, and OpenAQ.
https://luftdatenpumpe.readthedocs.io/
GNU Affero General Public License v3.0
34 stars 3 forks source link

Ambiguity with locations in Italy #15

Closed amotl closed 4 years ago

amotl commented 4 years ago

Luftdatenpumpe currently uses the Nominatim geocoder to resolve geolocations to appropriate names through a process called reverse geocoding.

However, users from Italy are observing problems with the outcome of Nominatim in this context [1]. They are reporting things like

Let's see what comes out of this and how we can mitigate these problems in order to improve the situation.

[1] https://community.panodata.org/t/access-to-influxdb-or-grafana-instance/39/13

amotl commented 4 years ago

We tried to investigate the issue by invoking

luftdatenpumpe stations --network=ldi --country=IT --reverse-geocode --progress | jq '[ .[].location.address ]'

and receive results for state == "Emilia-Romagna" like

  {
    "country_code": "IT",
    "country": "Italia",
    "state": "Emilia-Romagna",
    "county": "Parma",
    "postcode": "43036",
    "town": "Fidenza",
    "road": "Via Marco Polo",
    "house_number": "14",
    "city": "Fidenza"
  }

which look good.

On the other hand, we receive a couple of results where there might an ambiguity regarding "county": "Reggio nell'Emilia" and "city": "Reggio nell'Emilia":

  {
    "country_code": "IT",
    "country": "Italia",
    "state": "Emilia-Romagna",
    "county": "Reggio nell'Emilia",
    "postcode": "42121",
    "city": "Reggio nell'Emilia",
    "suburb": "San Pietro esterna",
    "road": "Piazza Guglielmo Marconi",
    "house_number": "11",
    "neighbourhood": "Porta San Pietro"
  }

This might be the reason for the complaints. However, we are not exactly sure about the issue yet.

amotl commented 4 years ago

To get the sorted list of all reverse geocoded location names within state == "Emilia-Romagna", this invocation might help:

luftdatenpumpe stations --network=ldi --country=IT --reverse-geocode --progress | \
    jq '[ map(select(.location.address.state == "Emilia-Romagna")) | .[].name ] | sort'

A list of unique city names within Emilia Romagna can be generated using

luftdatenpumpe stations --network=ldi --country=IT --reverse-geocode --progress | \
    jq '[ map(select(.location.address.state == "Emilia-Romagna")) | .[].location.address.city ] | unique'
[
  "Alseno",
  "Bagnolo in Piano",
  "Bentivoglio",
  "Bologna",
  "Calderara di Reno",
  "Campogalliano",
  "Casalecchio di Reno",
  "Correggio",
  "Cortemaggiore",
  "Fidenza",
  "Fornovo di Taro",
  "Modena",
  "Parma",
  "Piacenza",
  "Quattro Castella",
  "Reggio nell'Emilia",
  "Sala Baganza",
  "Salsomaggiore Terme",
  "San Giovanni in Persiceto",
  "Sorbolo Mezzani",
  "Varano de' Melegari",
  "Varsi",
  "Vigolzone",
  "Zola Predosa"
]
amotl commented 4 years ago

Please recognize that we are already curating the response from Nominatim in order to improve the quality of designated station names [1]. Kudos to @einsiedlerkrebs.

At [2], you will find the place where we try to improve the city attribute. So, we might also want to investigate this place when seeing anomalies or errors within the reverse geocoding process.

[1] https://github.com/panodata/luftdatenpumpe/blob/0.19.0/luftdatenpumpe/geo.py#L178-L194 [2] https://github.com/panodata/luftdatenpumpe/blob/0.19.0/luftdatenpumpe/geo.py#L238-L258

amotl commented 4 years ago

We have reset the Nominatim cache, the Redis database also used for caching as well as all items with country code == "IT" from the PostgreSQL database on the server machine. After repopulating it, the three-letter codes for regions within Italy have been mitigated and everything should be fine again. See also:

https://weather.hiveeyes.org/grafana/d/AOerEQQmk/luftdaten-info-karte?var-ldi_station_countrycode=IT

Thanks a bunch for bringing this to our attention.