Add `storage_units` to CSV export

nesnoj commented 2 months ago

Description of the issue

The DB table storage_units cannot be exported using db.to_csv() #560

Steps to Reproduce

db.to_csv("storages") does not include units

db.to_csv("storage_units") raises

ValueError: Allowed values for parameter data with API method are ['wind', 'solar', 'biomass', 'hydro', 'gsgk', 'combustion', 'nuclear', 'storage'] or ['balancing_area', 'electricity_consumer', 'gas_consumer', 'gas_producer', 'gas_storage', 'gas_storage_extended', 'grid_connections', 'grids', 'locations_extended', 'market_actors', 'market_roles', 'permit', 'deleted_units', 'retrofit_units', 'changed_dso_assignment']

Ideas of solution

Add to export function Joins are implemented here

nesnoj commented 2 months ago

I dug a little - my findings:

storage_units (GetStromSpeicher / anlagenstromspeicher / StorageUnits) Column NutzbareSpeicherkapazitaet included, filled, table cannot be exported to CSV
storage_extended (GetEinheitStromSpeicher / einheitenstromspeicher / StorageExtended) Column NutzbareSpeicherkapazitaet included, empty, table can be exported to CSV

When debugging write_mastr_xml_to_database() at this place reveals that XML data holds no data for NutzbareSpeicherkapazitaet.
This is in line with the in the current export description
So the column in the ORM model is obsolete
This raises the question of how the table structure is updated @FlorianK13 ? Is this done manually? The last update according to WSDL patchnotes has taken place in April? #496 (there have been 15 new releases since..)

How to proceed? Options:

Automatically join storage_units on storage_extended to get this parameter. The problem is, that the field storage_units.VerknuepfteEinheit sometimes holds multiple mastr ids which would either result in data loss or multiple lines in storage_extended (one for each line). I recommend let the user decide how to join manually.
Export storage_units when calling db.to_csv("storage") . A proper solution would be costly: The current export set involves ["unit_data", "eeg_data", "kwk_data", "permit_data"] (in this case, "unit_data" comes from storage_extended) and all of these 4 are joined into one table. So either we run into the join problem described in 1. or have to add extra stuff for export to a separate file.
Export storage_units when calling db.to_csv("storage_units") -> add to ADDITIONAL_TABLES and ORM_MAP which I tested and it works as expected. I'd also add some instructions to the docs how to join (some further ideas in #411).

I tend to implement 3. What do you think @FlorianK13?

FlorianK13 commented 2 months ago

This raises the question of how the table structure is updated @FlorianK13 ? Is this done manually? The last update according to WSDL patchnotes has taken place in April?

I guess you talk about updating the orm.py file from here, right? This is updated manually. We could think about how to make that automated. A few years ago the problem was that the data model from bulk download and from soap API were not the same. Some rows only exist in one source, some column names have spelling mistakes in one of the sources...

How to proceed? Options:

I'm fine with 3. The proper solution from 2 would be great of course, but we also keep our own resources in mind. I have no use case for such a storages table right now, so I cannot spend time to implement this. I think for you @nesnoj this is the same case.

nesnoj commented 2 months ago

This raises the question of how the table structure is updated @FlorianK13 ? Is this done manually? The last update according to WSDL patchnotes has taken place in April?

I guess you talk about updating the orm.py file from here, right? This is updated manually. We could think about how to make that automated. A few years ago the problem was that the data model from bulk download and from soap API were not the same. Some rows only exist in one source, some column names have spelling mistakes in one of the sources...

Thank you. I think as soon as the SOAP API is dropped, we can talk about automation. New issue: #563

How to proceed? Options:

I'm fine with 3. The proper solution from 2 would be great of course, but we also keep our own resources in mind. I have no use case for such a storages table right now, so I cannot spend time to implement this. I think for you @nesnoj this is the same case.

:+1: Sounds good, I'll prepare sth

OpenEnergyPlatform / open-MaStR