PowerGridModel / power-grid-model-io

Conversion tool for various grid data formats to power-grid-model
Mozilla Public License 2.0
14 stars 8 forks source link

[BUG] Handling of Duplicate Columns with different Units #250

Closed wfjvdham closed 3 months ago

wfjvdham commented 5 months ago

Describe the bug

Currently columns are checked for duplicated and renamed if the combination of column name and unit is the same (in ExcelFileStore._group_columns_by_index()). When only the column name is the same but the unit is different the columns are read as is. This can cause a problem later when using TabularData()._apply_unit_conversion()

To Reproduce

If for example an excel sheet contains two columns:

Those are not renamed because they are not identified as being a duplicate column. But here:

https://github.com/PowerGridModel/power-grid-model-io/blob/3d8eb4c1abe5fabcaa678238bd0a0655cba86684/src/power_grid_model_io/data_types/tabular_data.py#L146

The value of field is just P so the content of both columns is returned giving an error because string values can not be converted to W.

Expected behavior

The problem can be solved by changing table_data[field] *= multiplier by table_data[(field, unit)] *= multiplier but I am not sure if two columns with the same name should exists (even though they have different units). Maybe better would be to do the grouping on only the first value of the index in the ExcelFileStore._group_columns_by_index() function.

mgovers commented 5 months ago

Hi Wim,

Thank you for your clear report and repro case. We will investigate.

Jerry-Jinfeng-Guo commented 5 months ago

Dear Wim @wfjvdham ,

Thank you for reporting the issue.

This is an known issue for us. The issue lies more on the Vision side as we assume the users would provide input up to specs as per Pandas dictation. Since the excel handling within PGM-io was built on top of Pandas, the solution for now is to not include the extra information in an additional P field when exporting from Vision. We will make a more permanent solution in the coming sprints based on priorities.

wfjvdham commented 5 months ago

I can also make the solution, what would be the right way to do it?

mgovers commented 5 months ago

what would be the right way to do it?

* Rename the columns internally to `P` and `P_1`

* Change `table_data[field] *= multiplier` by `table_data[(field, unit)] *= multiplier`

it is on our agenda for the next refinement this afternoon. If we get to a design, we will post it here.

I can also make the solution

After that, whoever wants to pick it up can do that (based on priorities and availability between team AKI and team DGC). It's good to know that that's an option as well!

mgovers commented 5 months ago

@wfjvdham the conclusion of the refinement is that there's some unknowns surrounding the (un)stability of the Vision Excel export. Next sprint, we will first do some more investigation to this before we finalize a design. Implementation will probably come the sprint after that, as agreed with Jaap.

Jerry-Jinfeng-Guo commented 5 months ago

@wfjvdham Please find explicit addressing of this issue in the documentation as well. https://github.com/PowerGridModel/power-grid-model-io/pull/251 N.B., the solution offered in the documentation is based on the user feedback from GOAT, who up until this point has experienced no problems with this solution.

wfjvdham commented 5 months ago

Ok thanks 👍

Jaap-van-Wijck commented 3 months ago

Hi @Jerry-Jinfeng-Guo @petersalemink95 , do you have any updates about this bug?

I read #251 and it does not seem to provide a solution for us. Our team automates netchecks for K&O, and many usefull/critical information is stored in the "specifics" / "Bijzonderheden" tabs in Vision. For example MSR ID's, HLD ID's and "special contract types".

Wim has made a suggestion above how to solve this bug. Are there reasons not to implement this solution? If develop-capacity is an issue, we can also create a PR so that you only need to review. We can also create a feature branch to test the changes on all Vision files of all regions before merging to main. I prefer working (on the long term) from the main branch of PGM io for AKI, instead of using a feature branch ...

Jerry-Jinfeng-Guo commented 3 months ago

Hi @Jaap-van-Wijck , thank you for letting us know that disabling "specifics" is not enough for you. Internally renaming P and P_1 is probably not the best way to go given the unstable nature of Vision. On top of this, not all teams depend on the "specifics" at the same level, the introduced logic can be a potential source of bug.

I would like to ask you the following questions:

If above is not up to your expectation, please let us know and we will bring that up during the team meetings.

wfjvdham commented 3 months ago

@Jerry-Jinfeng-Guo

Before I suggested 2 options, 1 was renaming one of the columns. If that is not the best way to go what about the other option I suggested:

There nothing changes to the names of the columns

Jerry-Jinfeng-Guo commented 3 months ago

Hi @wfjvdham @Jaap-van-Wijck , please see: https://github.com/PowerGridModel/power-grid-model-io/pull/261

Jerry-Jinfeng-Guo commented 3 months ago

Closed by https://github.com/PowerGridModel/power-grid-model-io/pull/261