OpenEnergyPlatform / open-MaStR

A collaborative software to download the energy database Marktstammdatenregister (MaStR)
https://open-mastr.readthedocs.io/en/latest/
GNU Affero General Public License v3.0
87 stars 19 forks source link

Import of bulk fails: `UNIQUE constraint failed: solar_eeg.EegMastrNummer` #395

Closed nesnoj closed 1 year ago

nesnoj commented 1 year ago

Bulk import into DB fails with today's Gesamtdatenexport_20221215.zip (on develop, head: 85e9e3d4e4794d5f0e054f7d7dd38fb9f59171be):

[...]
File 'AnlagenEegSolar_19.xml' is parsed.
Data is cleansed.
File 'AnlagenEegSolar_20.xml' is parsed.
Data is cleansed.
File 'AnlagenEegSolar_21.xml' is parsed.
Data is cleansed.
Traceback (most recent call last):
  File ".../venv/lib/python3.8/site-packages/SQLAlchemy-2.0.0b4-py3.8-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1933, in _exec_single_context
    self.dialect.do_executemany(
  File ".../venv/lib/python3.8/site-packages/SQLAlchemy-2.0.0b4-py3.8-linux-x86_64.egg/sqlalchemy/engine/default.py", line 743, in do_executemany
    cursor.executemany(statement, parameters)
sqlite3.IntegrityError: UNIQUE constraint failed: solar_eeg.EegMastrNummer
[...]

Apart from the expectation the BNetzA should have some basic integrity checks to ensure unique IDs, wouldn't the tool benefit from such a check before DB insert? Or do we have it and it simply doesn't cover this column?

FlorianK13 commented 1 year ago

Strange. Actually we check for not unique values (as they appear quite often) and delete them. I can have a look at it next monday, but maybe @deniztepe has time and can check it?

deniztepe commented 1 year ago

Thank you for the bug report @nesnoj. I will try to have a look tomorrow.

nesnoj commented 1 year ago

Thank you for the bug report @nesnoj. I will try to have a look tomorrow.

Thanks for your comments! I figured it out:

The full traceback may would have helped, sorry for not posting it in the first place:

File 'AnlagenEegSolar_21.xml' is parsed.
Data is cleansed.
Traceback (most recent call last):
  File ".../venv/lib/python3.8/site-packages/SQLAlchemy-2.0.0b4-py3.8-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1933, in _exec_single_context
    self.dialect.do_executemany(
  File ".../venv/lib/python3.8/site-packages/SQLAlchemy-2.0.0b4-py3.8-linux-x86_64.egg/sqlalchemy/engine/default.py", line 743, in do_executemany
    cursor.executemany(statement, parameters)
sqlite3.IntegrityError: UNIQUE constraint failed: solar_eeg.EegMastrNummer

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File ".../venv/lib/python3.8/site-packages/open_mastr-0.12.1-py3.8.egg/open_mastr/xml_download/utils_write_to_database.py", line 204, in add_table_to_database
    df.to_sql(
  File ".../venv/lib/python3.8/site-packages/pandas-1.5.2-py3.8-linux-x86_64.egg/pandas/core/generic.py", line 2987, in to_sql
    return sql.to_sql(
  File ".../venv/lib/python3.8/site-packages/pandas-1.5.2-py3.8-linux-x86_64.egg/pandas/io/sql.py", line 695, in to_sql
    return pandas_sql.to_sql(
  File ".../venv/lib/python3.8/site-packages/pandas-1.5.2-py3.8-linux-x86_64.egg/pandas/io/sql.py", line 1738, in to_sql
    total_inserted = sql_engine.insert_records(
  File ".../venv/lib/python3.8/site-packages/pandas-1.5.2-py3.8-linux-x86_64.egg/pandas/io/sql.py", line 1335, in insert_records
    raise err
  File ".../venv/lib/python3.8/site-packages/pandas-1.5.2-py3.8-linux-x86_64.egg/pandas/io/sql.py", line 1325, in insert_records
    return table.insert(chunksize=chunksize, method=method)
  File ".../venv/lib/python3.8/site-packages/pandas-1.5.2-py3.8-linux-x86_64.egg/pandas/io/sql.py", line 946, in insert
    num_inserted = exec_insert(conn, keys, chunk_iter)
  File ".../venv/lib/python3.8/site-packages/pandas-1.5.2-py3.8-linux-x86_64.egg/pandas/io/sql.py", line 853, in _execute_insert
    result = conn.execute(self.table.insert(), data)
  File ".../venv/lib/python3.8/site-packages/SQLAlchemy-2.0.0b4-py3.8-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1414, in execute
    return meth(
  File ".../venv/lib/python3.8/site-packages/SQLAlchemy-2.0.0b4-py3.8-linux-x86_64.egg/sqlalchemy/sql/elements.py", line 489, in _execute_on_connection
    return connection._execute_clauseelement(
  File ".../venv/lib/python3.8/site-packages/SQLAlchemy-2.0.0b4-py3.8-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1638, in _execute_clauseelement
    ret = self._execute_context(
  File ".../venv/lib/python3.8/site-packages/SQLAlchemy-2.0.0b4-py3.8-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1842, in _execute_context
    return self._exec_single_context(
  File ".../venv/lib/python3.8/site-packages/SQLAlchemy-2.0.0b4-py3.8-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1983, in _exec_single_context
    self._handle_dbapi_exception(
  File ".../venv/lib/python3.8/site-packages/SQLAlchemy-2.0.0b4-py3.8-linux-x86_64.egg/sqlalchemy/engine/base.py", line 2325, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File ".../venv/lib/python3.8/site-packages/SQLAlchemy-2.0.0b4-py3.8-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1933, in _exec_single_context
    self.dialect.do_executemany(
  File ".../venv/lib/python3.8/site-packages/SQLAlchemy-2.0.0b4-py3.8-linux-x86_64.egg/sqlalchemy/engine/default.py", line 743, in do_executemany
    cursor.executemany(statement, parameters)
sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: solar_eeg.EegMastrNummer
[SQL: INSERT INTO solar_eeg ("Registrierungsdatum", "DatumLetzteAktualisierung", "EegInbetriebnahmedatum", "EegMastrNummer", "AnlagenkennzifferAnlagenregister_nv", "InstallierteLeistung", "RegistrierungsnummerPvMeldeportal_nv", "MieterstromZugeordnet", "AusschreibungZuschlag", "AnlageBetriebsstatus", "VerknuepfteEinheit", "AnlagenschluesselEeg", "RegistrierungsnummerPvMeldeportal", "MieterstromMeldedatum", "MieterstromErsteZuordnungZuschlag", "InanspruchnahmeZahlungNachEeg", "ZugeordneteGebotsmenge", "Zuschlagsnummer", "AnlagenkennzifferAnlagenregister", "DatenQuelle", "DatumDownload") VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]
[parameters: [('2021-05-03', '2021-07-07 11:52:17.386109', '2021-02-20', 'EEG988703174313', 0, 9.9, 0, 0, 0, 'In Betrieb', 'SEE954690952227', None, None, None, None, None, None, None, None, 'bulk', '2022-12-15'), ('2021-05-03', '2021-06-02 04:34:18.737025', '2012-06-26', 'EEG984589935680', 0, 12.46, 0, None, 0, 'In Betrieb', 'SEE901016823372', 'E2056801S160000000000041385400001', None, None, None, None, None, None, None, 'bulk', '2022-12-15'), ('2021-05-03', '2022-11-15 12:43:22.838660', '2008-12-02', 'EEG972442093325', 0, 62.9, 0, None, 0, 'In Betrieb', 'SEE963103805005', 'E21875011000000000000376265100000', None, None, None, None, None, None, None, 'bulk', '2022-12-15'), ('2021-05-03', '2021-05-03 19:59:04.396102', '2021-04-29', 'EEG906638458254', 0, 9.94, 0, 0, 0, 'In Betrieb', 'SEE912755170382', 'E1084301ABLO025702700000000000001', None, None, None, None, None, None, None, 'bulk', '2022-12-15'), ('2021-05-03', '2021-05-03 10:02:17.102664', '2012-07-20', 'EEG963541213417', 0, 35.1, 0, None, 0, 'In Betrieb', 'SEE983451979319', 'E2104101S160000000000036446000001', 'ASO-234549-12', None, None, None, None, None, None, 'bulk', '2022-12-15'), ('2021-05-03', '2021-05-05 09:21:17.332994', '2020-11-03', 'EEG926536886874', 0, 9.86, 0, 0, 0, 'In Betrieb', 'SEE999445066705', 'E105800100000001EMH00102021670001', None, None, None, None, None, None, None, 'bulk', '2022-12-15'), ('2021-05-03', '2021-06-16 04:50:20.599723', '2017-04-08', 'EEG982209204765', 0, 7.425, 0, None, 0, 'In Betrieb', 'SEE957648131644', 'E2104101S000000000000087006100001', None, None, None, None, None, None, None, 'bulk', '2022-12-15'), ('2021-05-03', '2021-05-03 10:54:41.955769', '2021-04-22', 'EEG924769284979', 0, 5.85, 0, 0, 0, 'In Betrieb', 'SEE979251659667', None, None, None, None, None, None, None, None, 'bulk', '2022-12-15')  ... displaying 10 of 100000 total bound parameter sets ...  ('2021-09-08', '2021-09-08 14:00:54.929968', '2011-03-16', 'EEG941845299566', 0, 5.992, 0, None, 0, 'In Betrieb', 'SEE978455145716', None, None, None, None, None, None, None, None, 'bulk', '2022-12-15'), ('2021-09-08', '2021-09-08 11:19:47.635627', '2021-08-18', 'EEG922291074402', 0, 29.7, 0, 0, 0, 'In Betrieb', 'SEE945952023878', None, None, None, None, None, None, None, None, 'bulk', '2022-12-15')]]
(Background on this error at: https://sqlalche.me/e/20/gkpj)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ".../create_database.py", line 3, in <module>
    db.download("bulk")
  File ".../venv/lib/python3.8/site-packages/open_mastr-0.12.1-py3.8.egg/open_mastr/mastr.py", line 225, in download
    write_mastr_xml_to_database(
  File ".../venv/lib/python3.8/site-packages/open_mastr-0.12.1-py3.8.egg/open_mastr/xml_download/utils_write_to_database.py", line 60, in write_mastr_xml_to_database
    add_table_to_database(
  File ".../venv/lib/python3.8/site-packages/open_mastr-0.12.1-py3.8.egg/open_mastr/xml_download/utils_write_to_database.py", line 226, in add_table_to_database
    df = write_single_entries_until_not_unique_comes_up(
  File ".../venv/lib/python3.8/site-packages/open_mastr-0.12.1-py3.8.egg/open_mastr/xml_download/utils_write_to_database.py", line 282, in write_single_entries_until_not_unique_comes_up
    pd.read_sql(sql=select(primary_key), con=engine).values.squeeze().tolist()
  File ".../venv/lib/python3.8/site-packages/pandas-1.5.2-py3.8-linux-x86_64.egg/pandas/io/sql.py", line 590, in read_sql
    return pandas_sql.read_query(
  File ".../venv/lib/python3.8/site-packages/pandas-1.5.2-py3.8-linux-x86_64.egg/pandas/io/sql.py", line 1560, in read_query
    result = self.execute(*args)
  File ".../venv/lib/python3.8/site-packages/pandas-1.5.2-py3.8-linux-x86_64.egg/pandas/io/sql.py", line 1405, in execute
    return self.connectable.execution_options().execute(*args, **kwargs)
AttributeError: 'OptionEngine' object has no attribute 'execute'

Process finished with exit code 1

So the problem is, that the engine is used here and not a connection: https://github.com/OpenEnergyPlatform/open-MaStR/blob/041461fae3ee595fec57bf11a3e9889836f3134b/open_mastr/xml_download/utils_write_to_database.py#L281-L283 It's strange that no one did run into this earlier as @FlorianK13 wrote above that the non-uniqueness was quite common. I'll do some more tests and prepare a fix...

nesnoj commented 1 year ago

I ran into the very same problem in other functions in the exception handling. I'm still wondering how passing the engine did work in the past :thinking:. I was wondering if it has sth to do with pandas or sqla (which could be more greedy in other versions), so I tried with py3.8 and py3.9 conda envs - same result.

Could you please give it a try with today's bulk file on develop @deniztepe ? (or tomorrow's - "hopefully" the data will not be fixed then by the BNetzA ;)

If you encounter the same problem I suppose the error handling was not necessary for a while(?)

nesnoj commented 1 year ago

Found it: As reckoned, it's about SQLA 2.0 (currently beta) which is installed by default :roll_eyes: . Implicit and connectionless execution has been removed: https://docs.sqlalchemy.org/en/20/changelog/migration_20.html#implicit-and-connectionless-execution-bound-metadata-removed

Indeed, everything works fine with v1.4.44.

nesnoj commented 1 year ago

Shold be fixed with #405

deniztepe commented 1 year ago

@nesnoj , can the branch https://github.com/OpenEnergyPlatform/open-MaStR/tree/fix-395-db-conn-handling-during-db-write-errors be closed? The fix is already merged.

nesnoj commented 1 year ago

Yes, thank you