ErikNixdorf / sbat

The Repo for the Surface Water Balance Analysis Tool
BSD 3-Clause "New" or "Revised" License
3 stars 0 forks source link

Structures of dataframes #26

Closed ErikNixdorf closed 8 months ago

ErikNixdorf commented 1 year ago

While looking at the element comparison warning, I noticed the column order of gauge_data is arbitrary each run. I don't know if it propagates on final results, but it should be the same for each run. I guess the apply operation on the grouped DataFrame is a bit complex.

Originally posted by @MarcoHannemann in https://github.com/ErikNixdorf/sbat/issues/24#issuecomment-1479091599

MarcoHannemann commented 1 year ago

Solved with b4c27fa


While looking into Example 3, I noticed sbat.gauges_meta does not group gauges together that contain NaN values. I suggest we keep them together.

Consider the following output of

sbat.gauges_meta["balance"]
gauge                             decade

[...]
schoenfeld                        1995      0.167932
                                  2005      0.033757
schoeps                           2005      0.187361
                                  2015      0.180896
                                  2025      0.093814
[...]

vetschau                          1995      0.495314
                                  2005      0.154159
                                  2015      0.320815
                                  2025      0.159062
goeritz_nr_195                    2025           NaN
hammerstadt_1                     1995           NaN
                                  2005           NaN
heinersbrueck                     2025           NaN
merzdorf_2                        2025           NaN
neusalza_spremberg                1995           NaN
                                  2005           NaN
niedergurig                       2025           NaN
radensdorf_1                      2025           NaN
radensdorf_2                      2025           NaN
reichwalde_3                      1995           NaN
                                  2005           NaN
schoenfeld                        2015           NaN
                                  2025           NaN
schoeps                           1995           NaN
Name: balance, dtype: float64

As you can see all gauges with NaNs for a specific decade are at the bottom of the dataframe. I think it would be better to group the gauges together. A quick solution is to stack and unstack, but it would be better to solve this at the location where the grouping operation is performed.

sbat.gauges_meta = sbat.gauges_meta.reindex(balance_mean, axis=0)
gauge                             decade

[...]
reichwalde_3                      1995           NaN
                                  2005           NaN
                                  2015      0.200459
                                  2025      0.256693
saerichen                         1995      0.202526
                                  2005      0.179485
                                  2015      0.256454
                                  2025      0.110422
schirgiswalde                     1995      0.852514
                                  2005      0.633872
                                  2015      0.474817
                                  2025     -0.014877
schmogrow_einlasswehr_nr_vi_up    1995      4.621551
                                  2005      0.662170
                                  2015      4.205882
                                  2025      0.601407
schmogrow_spreewehr_nr_vii_up     1995     -1.612786
                                  2005     -1.787368
                                  2015     -1.387917
                                  2025     -1.019507
schoenfeld                        1995      0.167932
                                  2005      0.033757
                                  2015           NaN
                                  2025           NaN
schoeps                           1995           NaN
                                  2005      0.187361
                                  2015      0.180896
                                  2025      0.093814
[...]
Name: balance, dtype: float64