earthobservations / luftdatenpumpe

Acquire and process live and historical air quality data without efforts. Filter by station-id, sensor-id and sensor-type, apply reverse geocoding, store into time-series and RDBMS databases, publish to MQTT, output as JSON, or visualize in Grafana. Data sources: Sensor.Community (luftdaten.info), IRCELINE, and OpenAQ.
https://luftdatenpumpe.readthedocs.io/
GNU Affero General Public License v3.0
34 stars 3 forks source link

`UnicodeEncodeError` when emitting station data to PostGIS #42

Closed ohobby closed 1 year ago

ohobby commented 1 year ago

Hi,

i try to install the Luftdatenpumpe at a Debian 11. I get a failure if i try to do this step "Run luftdatenpumpe for the first time to manifest database schema"

2022-12-10 20:57:54,783 [luftdatenpumpe.engine               ] INFO   : Emitting to target data sinks, this might take some time
  0%|                                                           | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/bin/luftdatenpumpe", line 8, in <module>
    sys.exit(run())
  File "/usr/local/lib/python3.9/dist-packages/luftdatenpumpe/commands.py", line 186, in run
    run_engine(options)
  File "/usr/local/lib/python3.9/dist-packages/luftdatenpumpe/commands.py", line 327, in run_engine
    engine.process(data)
  File "/usr/local/lib/python3.9/dist-packages/luftdatenpumpe/engine.py", line 73, in process
    target.emit(item)
  File "/usr/local/lib/python3.9/dist-packages/luftdatenpumpe/target/rdbms.py", line 78, in emit
    return self.store_station(station)
  File "/usr/local/lib/python3.9/dist-packages/luftdatenpumpe/target/rdbms.py", line 199, in store_station
    self.stationtable.upsert(stationdata, ["station_id"])
  File "/usr/local/lib/python3.9/dist-packages/dataset/table.py", line 264, in upsert
    row_count = self.update(row, keys, ensure=False, return_count=True)
  File "/usr/local/lib/python3.9/dist-packages/dataset/table.py", line 210, in update
    rp = self.db.executable.execute(stmt)
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1514, in _handle_dbapi_exception
    util.raise_(exc_info[1], with_traceback=exc_info[2])
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    cursor.execute(statement, parameters)
UnicodeEncodeError: 'ascii' codec can't encode character '\xfc' in position 41: ordinal not in range(128)

What can be done to fix this? Thanks, Oliver

amotl commented 1 year ago

Dear Oliver,

thank you for writing in. Can I ask you about the specific command you have invoked, and which versions of Luftdatenpumpe and PostGIS you are using?

On my machine, invoking this command does not produce any errors, using Python 3.9.15, PostGIS 14-3.3, SQLAlchemy 1.3.24, and Redis 7.

luftdatenpumpe stations --network=ldi --station=49,1033 --reverse-geocode --target=postgresql://luftdatenpumpe@localhost/weatherbase --progress

Now, I am running the program without the --station=49,1033 filter, instead using --country=DE. However, it will take a considerable time to eventually hit the problematic record(s). If you are using another filter, I will be happy to use it in order to be able to discover the problem more quickly.

With kind regards, Andreas.

P.S.: Currently at 4%|█   | 675/20057 [01:45<21:33, 11.78it/s].

P.P.S.: Not exactly related to your problem, 89e5961ef slightly updates the Luftdaten-Viewer Databases section of the documentation [^1], making it easier to follow in both production and development/sandbox mode.

amotl commented 1 year ago

Hi again,

modulo a few warnings, that worked well on my machine. Can you tell me how you are invoking the program, and whether it fails right away, or only after some time of processing?

$ luftdatenpumpe stations --network=ldi --country=DE --reverse-geocode --target=postgresql://luftdatenpumpe@localhost/weatherbase --progress
2022-12-10 23:18:03,897 [luftdatenpumpe.source               ] INFO   : Applying filter: Munch({'country': ['DE']})
2022-12-10 23:18:03,897 [luftdatenpumpe.source.common        ] INFO   : Downloading without caching
2022-12-10 23:18:03,897 [luftdatenpumpe.commands             ] INFO   : Acquiring list of stations from network "ldi" with source "api"
2022-12-10 23:18:03,897 [luftdatenpumpe.source.luftdaten_info] INFO   : Requesting luftdaten.info live API at https://api.luftdaten.info/static/v1/data.json
2022-12-10 23:18:05,493 [luftdatenpumpe.source.luftdaten_info] INFO   : Timestamp of first record: 2022-12-10T22:17:02Z
2022-12-10 23:18:05,564 [luftdatenpumpe.source.common        ] INFO   : Processing 20057 items
2022-12-10 23:32:20,024 [luftdatenpumpe.commands             ] INFO   : Acquired #5249 stations
2022-12-10 23:32:20,026 [luftdatenpumpe.commands             ] INFO   : Will publish data to ['postgresql://luftdatenpumpe@localhost/weatherbase']
2022-12-10 23:32:20,026 [luftdatenpumpe.engine               ] INFO   : Configuring data sink "postgresql://luftdatenpumpe@localhost/weatherbase" with domain "stations"
2022-12-10 23:32:20,149 [luftdatenpumpe.target.rdbms         ] INFO   : PostGIS version: POSTGIS="3.3.2 4975da8" [EXTENSION] PGSQL="140" GEOS="3.9.0-CAPI-1.16.2" PROJ="7.2.1" LIBXML="2.9.10" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)"
2022-12-10 23:32:20,193 [luftdatenpumpe.engine               ] INFO   : Emitting to target data sinks, this might take some time
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5249/5249 [01:38<00:00, 53.44it/s]
2022-12-10 23:33:58,422 [luftdatenpumpe.engine               ] INFO   : Processed 5249 records
weatherbase=> select count(*) from ldi_stations;
 count
-------
  5249

With kind regards, Andreas.

$ luftdatenpumpe --version
luftdatenpumpe 0.21.1
$ python
>>> import dataset
>>> import sqlalchemy
>>> dataset.__version__
'1.5.2'
>>> sqlalchemy.__version__
'1.3.24'
amotl commented 1 year ago

I've run the processing on a few other European countries with location names having characters not within the Latin alphabet, for example with Cyrillic characters, like GR and UA. I did not discover any problems so far.

There are occasions where the Nominatim cache gets corrupted or outdated somehow, or is just not compatible between Python versions, because Pickle gets used or whatnot. So, maybe clearing out the cache database in Redis makes a difference in your case.

ohobby commented 1 year ago

Hi Andreas,

Steps

The Debian 11 System is new installed and i have make the steps like the documentation.

  1. https://github.com/earthobservations/luftdatenpumpe/blob/main/doc/setup/luftdatenpumpe.rst
  2. https://github.com/earthobservations/luftdatenpumpe/blob/main/doc/setup/ldview-applications-debian.rst
  3. https://github.com/earthobservations/luftdatenpumpe/blob/main/doc/setup/ldview-databases.rst
  4. https://github.com/earthobservations/luftdatenpumpe/blob/main/doc/setup/ldview-grafana.rst

Version information

Problem

I work in a virtualenv and get this error at the first time to manifest database schema:

(luftdatenpumpe) root@luftdatenpumpe:~# luftdatenpumpe stations --network=ldi --station=49,1033 --reverse-geocode --target=postgresql://luftdatenpumpe@localhost/weatherbase --progress
2022-12-11 10:33:32,079 [luftdatenpumpe.source               ] INFO   : Applying filter: Munch({'station': [49, 1033]})
2022-12-11 10:33:32,079 [luftdatenpumpe.source.common        ] INFO   : Downloading without caching
2022-12-11 10:33:32,079 [luftdatenpumpe.commands             ] INFO   : Acquiring list of stations from network "ldi" with source "api"
2022-12-11 10:33:32,079 [luftdatenpumpe.source.luftdaten_info] INFO   : Requesting luftdaten.info live API at https://api.luftdaten.info/static/v1/data.json
2022-12-11 10:33:33,030 [luftdatenpumpe.source.luftdaten_info] INFO   : Timestamp of first record: 2022-12-11T10:33:01Z
2022-12-11 10:33:33,106 [luftdatenpumpe.source.common        ] INFO   : Processing 10 items
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 1836.14it/s]
2022-12-11 10:33:33,115 [luftdatenpumpe.commands             ] INFO   : Acquired #2 stations
2022-12-11 10:33:33,116 [luftdatenpumpe.commands             ] INFO   : Will publish data to ['postgresql://luftdatenpumpe@localhost/weatherbase']
2022-12-11 10:33:33,116 [luftdatenpumpe.engine               ] INFO   : Configuring data sink "postgresql://luftdatenpumpe@localhost/weatherbase" with domain "stations"
2022-12-11 10:33:33,133 [luftdatenpumpe.target.rdbms         ] INFO   : PostGIS version: POSTGIS="3.1.1 aaf4c79" [EXTENSION] PGSQL="130" GEOS="3.9.0-CAPI-1.16.2" PROJ="7.2.1" LIBXML="2.9.10" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)"
2022-12-11 10:33:33,159 [luftdatenpumpe.engine               ] INFO   : Emitting to target data sinks, this might take some time
  0%|                                                                                                                              | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/bin/luftdatenpumpe", line 8, in <module>
    sys.exit(run())
  File "/usr/local/lib/python3.9/dist-packages/luftdatenpumpe/commands.py", line 186, in run
    run_engine(options)
  File "/usr/local/lib/python3.9/dist-packages/luftdatenpumpe/commands.py", line 327, in run_engine
    engine.process(data)
  File "/usr/local/lib/python3.9/dist-packages/luftdatenpumpe/engine.py", line 73, in process
    target.emit(item)
  File "/usr/local/lib/python3.9/dist-packages/luftdatenpumpe/target/rdbms.py", line 78, in emit
    return self.store_station(station)
  File "/usr/local/lib/python3.9/dist-packages/luftdatenpumpe/target/rdbms.py", line 199, in store_station
    self.stationtable.upsert(stationdata, ["station_id"])
  File "/usr/local/lib/python3.9/dist-packages/dataset/table.py", line 264, in upsert
    row_count = self.update(row, keys, ensure=False, return_count=True)
  File "/usr/local/lib/python3.9/dist-packages/dataset/table.py", line 210, in update
    rp = self.db.executable.execute(stmt)
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1514, in _handle_dbapi_exception
    util.raise_(exc_info[1], with_traceback=exc_info[2])
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    cursor.execute(statement, parameters)
UnicodeEncodeError: 'ascii' codec can't encode character '\xfc' in position 41: ordinal not in range(128)
ohobby commented 1 year ago

If i run the command you have wrote, it runs long time before it runs in the error.

$ luftdatenpumpe stations --network=ldi --country=DE --reverse-geocode --target=postgresql://luftdatenpumpe@localhost/weatherbase --progress
2022-12-11 10:54:21,275 [luftdatenpumpe.commands             ] INFO   : Acquired #5274 stations
2022-12-11 10:54:21,276 [luftdatenpumpe.commands             ] INFO   : Will publish data to ['postgresql://luftdatenpumpe@localhost/weatherbase']
2022-12-11 10:54:21,276 [luftdatenpumpe.engine               ] INFO   : Configuring data sink "postgresql://luftdatenpumpe@localhost/weatherbase" with domain "stations"
2022-12-11 10:54:21,295 [luftdatenpumpe.target.rdbms         ] INFO   : PostGIS version: POSTGIS="3.1.1 aaf4c79" [EXTENSION] PGSQL="130" GEOS="3.9.0-CAPI-1.16.2" PROJ="7.2.1" LIBXML="2.9.10" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)"
2022-12-11 10:54:21,322 [luftdatenpumpe.engine               ] INFO   : Emitting to target data sinks, this might take some time
  0%|                                                                                                                           | 0/5274 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/bin/luftdatenpumpe", line 8, in <module>
    sys.exit(run())
  File "/usr/local/lib/python3.9/dist-packages/luftdatenpumpe/commands.py", line 186, in run
    run_engine(options)
  File "/usr/local/lib/python3.9/dist-packages/luftdatenpumpe/commands.py", line 327, in run_engine
    engine.process(data)
  File "/usr/local/lib/python3.9/dist-packages/luftdatenpumpe/engine.py", line 73, in process
    target.emit(item)
  File "/usr/local/lib/python3.9/dist-packages/luftdatenpumpe/target/rdbms.py", line 78, in emit
    return self.store_station(station)
  File "/usr/local/lib/python3.9/dist-packages/luftdatenpumpe/target/rdbms.py", line 199, in store_station
    self.stationtable.upsert(stationdata, ["station_id"])
  File "/usr/local/lib/python3.9/dist-packages/dataset/table.py", line 264, in upsert
    row_count = self.update(row, keys, ensure=False, return_count=True)
  File "/usr/local/lib/python3.9/dist-packages/dataset/table.py", line 210, in update
    rp = self.db.executable.execute(stmt)
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1514, in _handle_dbapi_exception
    util.raise_(exc_info[1], with_traceback=exc_info[2])
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    cursor.execute(statement, parameters)
UnicodeEncodeError: 'ascii' codec can't encode character '\xfc' in position 41: ordinal not in range(128)
amotl commented 1 year ago

Hi Oliver,

thank you again. I am not able to reproduce the error so far, and I am running out of ideas. Please find my report below, together with some more suggestions and ideas.

I am really sad about it, because Luftdatenpumpe works successfully for us within different environments. While I am using macOS with services running in Docker instead of a pure Linux installation, we are running the whole stack in production on both CentOS and Debian machines, certainly with slightly deviating infrastructure software versions, so I don't really think this is the issue. Nevertheless, I am asking diligently about software versions, because it would be easy if we could find the reason there.

With kind regards, Andreas.

Version information

I've exercised the procedure using roughly the same software versions you've described. I am running now:

Infrastructure

Python modules

>>> import dataset, dogpile, geoalchemy2, pandas, psycopg2, redis, sqlalchemy, sqlalchemy_utils
>>> dataset.__version__
'1.5.2'
>>> dogpile.__version__
'1.1.8'
>>> geoalchemy2.__version__
'0.11.1'
>>> pandas.__version__
'1.4.3'
>>> psycopg2.__version__
'2.9.3 (dt dec pq3 ext lo64)'
>>> redis.__version__
'4.4.0'
>>> sqlalchemy.__version__
'1.3.24'
>>> sqlalchemy_utils.__version__
'0.38.2'

Observed errors

None so far, using both commands:

luftdatenpumpe stations --network=ldi --country=DE --reverse-geocode --target=postgresql://luftdatenpumpe@localhost/weatherbase --progress
luftdatenpumpe stations --network=ldi --country=GR,UA --reverse-geocode --target=postgresql://luftdatenpumpe@localhost/weatherbase --progress

Thoughts

I work in a virtualenv.

That's a good decision. Still, I see that Luftdatenpumpe thinks it is installed at /usr/local/lib/python3.9/dist-packages/luftdatenpumpe. Are you sure you are using the most recent version within the virtualenv you are actually meaning to use?

There are occasions where the Nominatim cache gets corrupted or outdated somehow, or is just not compatible between Python versions, because Pickle gets used or whatnot. So, maybe clearing out the cache database in Redis makes a difference in your case.

Can you try to clear out your Redis cache? Assuming you don't have any other valuable data within your Redis server, you can inspect and clear the database using those commands. Where it displays that the database contains some items like db0:keys=2719,expires=0,avg_ttl=0 before the flushdb operation, it should be empty afterwards.

redis-cli info | grep ^db
redis-cli flushdb

In general, I am assuming everything works well when omitting the --target option, right? You can also experiment with omitting the --reverse-geocode option, or using it together with the --disable-nominatim-cache option. Maybe this makes any difference on your setup?

amotl commented 1 year ago

While I am using macOS with services running in Docker instead of a pure Linux installation, we are running the whole stack in production on both CentOS and Debian machines.

To be more specific on this, I've just checked what's actually running on the servers machines we've operated the software stack around Luftdatenpumpe successfully on.

amotl commented 1 year ago

I've also just refreshed all dependencies in my sandbox using pip install --editable=. --upgrade --force-reinstall. Even with the most recent versions of all dependency packages, I am not able to observe any regressions.

ohobby commented 1 year ago

Hi Andreas,

I have completely rebuilt the server and now the error from above no longer occurs. (Now the next Problem is, to store the data in the IInfluxdb)

Many Thanks.

amotl commented 1 year ago

Hi Oliver,

I have completely rebuilt the server and now the error from above no longer occurs.

Excellent!

Now the next problem is to store the data in the InfluxDB.

Let me know about any problems on this matter by creating another issue.

With kind regards, Andreas.

ohobby commented 1 year ago

The last problem was, that the influxd was not running. Now it is fine.

Many Thanks :-)