OpenEnergyPlatform / open-MaStR

A collaborative software to download the energy database Marktstammdatenregister (MaStR)
https://open-mastr.readthedocs.io/en/latest/
GNU Affero General Public License v3.0
84 stars 17 forks source link

add_missing_column_to_table fails as the quotation marks mismatch #469

Closed k-schmid closed 9 months ago

k-schmid commented 11 months ago

Description of the issue

When downloading (and extracting) the latest XML using the bulk download a stumble in to an error regarding new columns. The method add_missing_column_to_table tries to split the psycopg2 undefined column error message such that a column name is found. However the splitting argument "»" and "«" cannot be found.

See the full traceback:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2100, in _exec_insertmany_context
    dialect.do_execute(cursor, sub_stmt, sub_params, context)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 747, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.UndefinedColumn: column "DatumEndgueltigeStilllegung" of relation "gas_producer" does not exist
LINE 1: ...gDatum", "DatumBeginnVoruebergehendeStilllegung", "DatumEndg...
                                                             ^

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/open_mastr/xml_download/utils_write_to_database.py", line 206, in add_table_to_database
    df.to_sql(
  File "/usr/local/lib/python3.10/site-packages/pandas/util/_decorators.py", line 333, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/generic.py", line 3008, in to_sql
    return sql.to_sql(
  File "/usr/local/lib/python3.10/site-packages/pandas/io/sql.py", line 788, in to_sql
    return pandas_sql.to_sql(
  File "/usr/local/lib/python3.10/site-packages/pandas/io/sql.py", line 1958, in to_sql
    total_inserted = sql_engine.insert_records(
  File "/usr/local/lib/python3.10/site-packages/pandas/io/sql.py", line 1507, in insert_records
    raise err
  File "/usr/local/lib/python3.10/site-packages/pandas/io/sql.py", line 1498, in insert_records
    return table.insert(chunksize=chunksize, method=method)
  File "/usr/local/lib/python3.10/site-packages/pandas/io/sql.py", line 1059, in insert
    num_inserted = exec_insert(conn, keys, chunk_iter)
  File "/usr/local/lib/python3.10/site-packages/pandas/io/sql.py", line 951, in _execute_insert
    result = conn.execute(self.table.insert(), data)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1414, in execute
    return meth(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 485, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1638, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1837, in _execute_context
    return self._exec_insertmany_context(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2103, in _exec_insertmany_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2325, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2100, in _exec_insertmany_context
    dialect.do_execute(cursor, sub_stmt, sub_params, context)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 747, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedColumn) column "DatumEndgueltigeStilllegung" of relation "gas_producer" does not exist
LINE 1: ...gDatum", "DatumBeginnVoruebergehendeStilllegung", "DatumEndg...
                                                             ^

[SQL: INSERT INTO gas_producer ("EinheitMastrNummer", "DatumLetzteAktualisierung", "LokationMastrNummer", "NetzbetreiberpruefungStatus", "AnlagenbetreiberMastrNummer", "Land", "Bundesland", "Landkreis", "Gemeinde", "Gemeindeschluessel", "Postleitzahl", "Or ... 368380 characters truncated ... lllegung__326)s, %(DatumEndgueltigeStilllegung__326)s, %(DatenQuelle__326)s, %(DatumDownload__326)s)]
[parameters: {'Land__0': 'Deutschland', 'NetzbetreiberpruefungDatum__0': None, 'NameGaserzeugungseinheit__0': 'KGE-Erdgasspeicher', 'DatumBeginnVoruebergehendeStilllegung__0': None, 'StrasseNichtGefunden__0': None, 'LokationMastrNummer__0': 'GEL903981660167', 'NichtVorhandenInMigriertenEinheiten__0': 0, 'Adresszusatz__0': None, 'Gemeinde__0': 'Gronau (Westf.)', 'Bundesland__0': 'Nordrhein-Westfalen', 'Hausnummer_nv__0': None, 'Postleitzahl__0': '48599', 'DatumLetzteAktualisierung__0': datetime.datetime(2019, 10, 22, 7, 51, 34, 20038), 'Ort__0': 'Gronau', 'AnlagenbetreiberMastrNummer__0': 'ABR920698896597', 'DatumRegistrierungDesBetreiberwechsels__0': None, 'Breitengrad__0': None, 'Gemeindeschluessel__0': '05554020', 'EinheitSystemstatus__0': 'Aktiviert', 'NetzbetreiberpruefungStatus__0': True, 'Landkreis__0': 'Borken', 'GeplantesInbetriebnahmedatum__0': None, 'Technologie__0': None, 'Gemarkung__0': None, 'DatumEndgueltigeStilllegung__0': None, 'Inbetriebnahmedatum__0': datetime.datetime(2012, 10, 1, 0, 0), 'Laengengrad__0': None, 'DatenQuelle__0': 'bulk', 'Hausnummer__0': None, 'DatumDesBetreiberwechsels__0': None, 'EinheitBetriebsstatus__0': 'In Betrieb', 'SpeicherMastrNummer__0': 'GSE917820127322', 'DatumDownload__0': datetime.datetime(2023, 10, 6, 0, 0), 'FlurFlurstuecknummern__0': None, 'Strasse__0': None, 'Erzeugungsleistung__0': None, 'Registrierungsdatum__0': datetime.datetime(2019, 5, 20, 0, 0), 'EinheitMastrNummer__0': 'GEE995046355477', 'HausnummerNichtGefunden__0': None, 'Land__1': 'Deutschland', 'NetzbetreiberpruefungDatum__1': None, 'NameGaserzeugungseinheit__1': 'Trianel Gasspeicher Epe', 'DatumBeginnVoruebergehendeStilllegung__1': None, 'StrasseNichtGefunden__1': None, 'LokationMastrNummer__1': 'GEL963638601943', 'NichtVorhandenInMigriertenEinheiten__1': 0, 'Adresszusatz__1': None, 'Gemeinde__1': 'Gronau (Westf.)', 'Bundesland__1': 'Nordrhein-Westfalen', 'Hausnummer_nv__1': None ... 12653 parameters truncated ... 'Hausnummer__325': None, 'DatumDesBetreiberwechsels__325': None, 'EinheitBetriebsstatus__325': 'In Betrieb', 'SpeicherMastrNummer__325': None, 'DatumDownload__325': datetime.datetime(2023, 10, 6, 0, 0), 'FlurFlurstuecknummern__325': '334', 'Strasse__325': None, 'Erzeugungsleistung__325': 2750.0, 'Registrierungsdatum__325': datetime.datetime(2023, 9, 1, 0, 0), 'EinheitMastrNummer__325': 'GEE937644281314', 'HausnummerNichtGefunden__325': 0.0, 'Land__326': 'Deutschland', 'NetzbetreiberpruefungDatum__326': None, 'NameGaserzeugungseinheit__326': 'Bioraffinerie Kanalhafen', 'DatumBeginnVoruebergehendeStilllegung__326': None, 'StrasseNichtGefunden__326': 0.0, 'LokationMastrNummer__326': None, 'NichtVorhandenInMigriertenEinheiten__326': 0, 'Adresszusatz__326': None, 'Gemeinde__326': 'Rheine', 'Bundesland__326': 'Nordrhein-Westfalen', 'Hausnummer_nv__326': 0.0, 'Postleitzahl__326': '48432', 'DatumLetzteAktualisierung__326': datetime.datetime(2023, 10, 5, 10, 5, 10, 900993), 'Ort__326': 'Rheine', 'AnlagenbetreiberMastrNummer__326': 'ABR928065335663', 'DatumRegistrierungDesBetreiberwechsels__326': None, 'Breitengrad__326': 52.297882, 'Gemeindeschluessel__326': '05566076', 'EinheitSystemstatus__326': 'Aktiviert', 'NetzbetreiberpruefungStatus__326': False, 'Landkreis__326': 'Steinfurt', 'GeplantesInbetriebnahmedatum__326': datetime.datetime(2023, 10, 16, 0, 0), 'Technologie__326': 'Biomethan-Erzeugung', 'Gemarkung__326': None, 'DatumEndgueltigeStilllegung__326': None, 'Inbetriebnahmedatum__326': None, 'Laengengrad__326': 7.498307, 'DatenQuelle__326': 'bulk', 'Hausnummer__326': '111', 'DatumDesBetreiberwechsels__326': None, 'EinheitBetriebsstatus__326': 'In Planung', 'SpeicherMastrNummer__326': None, 'DatumDownload__326': datetime.datetime(2023, 10, 6, 0, 0), 'FlurFlurstuecknummern__326': None, 'Strasse__326': 'Kanalstr.', 'Erzeugungsleistung__326': 15420.0, 'Registrierungsdatum__326': datetime.datetime(2023, 10, 5, 0, 0), 'EinheitMastrNummer__326': 'GEE904412023018', 'HausnummerNichtGefunden__326': 1.0}]
(Background on this error at: https://sqlalche.me/e/20/f405)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/project/Pipelines/mastr_to_db/main.py", line 104, in <module>
    raise e
  File "/opt/project/Pipelines/mastr_to_db/main.py", line 101, in <module>
    main()
  File "/opt/project/Pipelines/mastr_to_db/main.py", line 71, in main
    db.download(data=[
  File "/usr/local/lib/python3.10/site-packages/open_mastr/mastr.py", line 231, in download
    write_mastr_xml_to_database(
  File "/usr/local/lib/python3.10/site-packages/open_mastr/xml_download/utils_write_to_database.py", line 60, in write_mastr_xml_to_database
    add_table_to_database(
  File "/usr/local/lib/python3.10/site-packages/open_mastr/xml_download/utils_write_to_database.py", line 218, in add_table_to_database
    add_missing_column_to_table(err, engine, xml_tablename)
  File "/usr/local/lib/python3.10/site-packages/open_mastr/xml_download/utils_write_to_database.py", line 324, in add_missing_column_to_table
    missing_column = err.args[0].split("»")[1].split("«")[0]
IndexError: list index out of range

Ideas of solution

I think the string splitting should be replaced by a more robust approach. E.g. check which columns are already in the table and which are assumed to be there prior to writing to the table. There are some approach listed here: https://www.geeksforgeeks.org/get-column-names-from-postgresql-table-using-psycopg2/

Context and Environment

Workflow checklist

FlorianK13 commented 11 months ago

@k-schmid I remember when writing that code that I was definetly not proud of this 'quick fix' solution - it had to break some day. I'll look into it.

FlorianK13 commented 11 months ago

@k-schmid #470 should solve the error

k-schmid commented 11 months ago

@FlorianK13 thanks a lot for the fast response.

k-schmid commented 10 months ago

@FlorianK13 any updates on this?

FlorianK13 commented 10 months ago

@chrwm Are we ready for the 0.14.0 release?

FlorianK13 commented 9 months ago

@k-schmid We have just release v0.14.0 and it should be fixed now