39 add overschrijding hydra statistics

Mares2022 commented 3 weeks ago

I have mainly modified KWK_process.py. I have added a function called set_hydra_nl_table() to set the Hydra-NL original tables into the dist_vali_exc dict dictionary. The set_table() function was also added to set validation data and the old exceedance frequency Hydra-NL function.

The pull request also includes the tables generated in Hydra-NL in .xls and .csv format for the stations available + Hydra-NL documentation.

The pull request has two jupyter notebooks, the first one called geometries was used KWK_geometries.ipynb was used to calculate the stations locations. However, this code is not working anymore after the changes made in the main source code for me. The same error is shown when running KWK_getcheckdata.py from this branch and from the main branch. Something to check later on to be sure there is no problem with the environments.

The second jupyter notebook KWK_process.ipynb can be ignored and not included in the main branch, as I only used it for testing purposes. I would like to keep the code anyway if possible in the main branch for future development.

codecov[bot] commented 3 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 87.96%. Comparing base (65adefe) to head (44eb596). Report is 3 commits behind head on main.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #106 +/- ## ========================================== - Coverage 87.97% 87.96% -0.02% ========================================== Files 9 9 Lines 1156 1155 -1 ========================================== - Hits 1017 1016 -1 Misses 139 139 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

veenstrajelmer commented 3 weeks ago

Thanks for this PR, I do have some additional requests:

no data in the repository, in this case all csv/xls/json files. You can put a link to the p-drive in the example script. I know this is hard-coded and a bit ugly, but at least then it is clear that it is a dependent data source. I also see several copies of the csv-files on the P-drive, please try to avoid this to keep everything transparent.
it is good to work with a single source of truth, this is complex when putting code in py as well as ipynb, I would suggest to remove the ipynb files. If you want to work interactively with vscode, you can also run the .py files as if they were notebooks. I do not know how exactly, but I have seen it happening.
it is not clear to me where the csv files come from. I expect you create them from the hydra-xls files with one of the ipynb files (I have not looked at them). If this is indeed the case, it would be cleaner to just read the hydra-xls file directly. If the csv files come directly from hydra-nl, the xls files can be removed right?
minimize the diff (e.g. revert the change in KWK_getcheckdata.py since it does not contribute to this PR)

Mares2022 commented 3 weeks ago

Adressing @veenstrajelmer requests.

[x] no data in the repository, in this case all csv/xls/json files. I have removed now the files and saved them in the Pdrive: P:\11210325-005-kenmerkende-waarden\work\data_hydraNL
[x] it is good to work with a single source of truth, this is complex when putting code in py as well as ipynb, I would suggest to remove the ipynb files. I have removed the KWK_process.ipynb. The KWK_geometries.ipynb has an interactive map that requires a notebook to be displayed. In case this code is not needed anymore I will erase it, but converting to .py cannot be done. Let me know if I should erase it.
[x] it is not clear to me where the csv files come from. I expect you create them from the hydra-xls files with one of the ipynb files (I have not looked at them). If this is indeed the case, it would be cleaner to just read the hydra-xls file directly. Files in .xls have symbols in a code that did not allow me to open them with pandas. The created excel files are also in an old format. The best approach based on my experience will be to create .csv files from the python Hydra-NL API, or request the developers to provide the data in these formats with standard code symbols. The csv were created manually in excel.
[x] minimize the diff (e.g. revert the change in KWK_getcheckdata.py since it does not contribute to this PR). I will take it back to the main branch version.

veenstrajelmer commented 3 weeks ago

Files in .xls have symbols in a code that did not allow me to open them with pandas.

I have looked into reading this old xls format and found this solution that works:

import pandas as pd
file_xls = r"p:\11210325-005-kenmerkende-waarden\work\data_hydraNL\DENHDR.xls"
df = pd.read_table(file_xls, encoding='latin-1')

Mares2022 commented 3 weeks ago

A new commit including df = pd.read_table(file_xls, encoding='latin-1') was created. Now we are able to read the original Hydra-NL files in .xls format.

veenstrajelmer commented 3 weeks ago

Thanks for the updates! I have now looked into KWK_geometries.ipynb and it can indeed be useful for this station selection (until we move to Hydra-RING API). However, could you trim it down so it only contains the code necessary for the interactive map? Also, you could avoid saving the geojson since the gdf variable is already being generated from the DDL catalog and you do not seem to use it elsewhere.

Mares2022 commented 3 weeks ago

[x] KWK_process.ipynb needs to be cleaned and included in the main branch for further visualisation of stations.
[x] Validation data for Hoak van Holland. The extension in the path was incorrect before. I was changed from .xls to .csv
[ ] A message indicating if the hydra-nl or validation file path exists or not must be implemented to inform the users if the overschrijding function integrated Hydra-NL data or validation data (if available) to avoid the above mentioned issue.

sonarcloud[bot] commented 3 weeks ago

Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

Deltares-research / kenmerkendewaarden

39 add overschrijding hydra statistics #106

Codecov Report

Quality Gate passed