VForWaTer / metacatalog

Modular metadata management platform for environmental data.
https://vforwater.github.io/metacatalog
GNU General Public License v3.0
3 stars 1 forks source link

using pd.concat() instead of pd.merge() #171

Closed AlexDo1 closed 2 years ago

AlexDo1 commented 2 years ago

Always merge Split datasets additionally.

The default parameters of pd.concat() should fit our use case.

codecov[bot] commented 2 years ago

Codecov Report

Merging #171 (38494b4) into master (3c547d4) will decrease coverage by 0.01%. The diff coverage is 33.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #171      +/-   ##
==========================================
- Coverage   63.58%   63.57%   -0.02%     
==========================================
  Files          67       67              
  Lines        3095     3105      +10     
==========================================
+ Hits         1968     1974       +6     
- Misses       1127     1131       +4     
Flag Coverage Δ
e2e 63.57% <33.33%> (-0.02%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
metacatalog/util/results.py 80.24% <33.33%> (-1.34%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 3c547d4...38494b4. Read the comment docs.

AlexDo1 commented 2 years ago

The latest commit uses the 'old' pd.merge() function for Composites and pd.concat for Split datasets now.

Composite: Eddy data: image

Split dataset: LUBW gauge data: image

AlexDo1 commented 2 years ago

That looks way better now, thanks a lot! Is there an easy way to add a unit-test for the changed behavior? If yes, that would be cool, if not, never mind and merge.

I think you tested most of the merging behavior of Split datasets here: https://github.com/VForWaTer/metacatalog/blob/3c547d4ab06178a41c347bc23afd1cf5dfa17f22/metacatalog/test/test_array_type_data.py#L246-L280

I just copied most of that test for a second test with a Composite dataset and tested for the shape of the merged data. I guess we could test for more stuff here, I just focused on the merging behavior.

AlexDo1 commented 2 years ago

@mmaelicke From our conversation above, I understand that the tests are not perfect, but we should still be able to merge the PR since the results look good.

mmaelicke commented 2 years ago

It's not so easy to continue a discussion after 10 months :) But I agree, it should be fine to merge.