VForWaTer / metacatalog

Modular metadata management platform for environmental data.
https://vforwater.github.io/metacatalog
GNU General Public License v3.0
3 stars 1 forks source link

add merge behavior #158

Closed mmaelicke closed 3 years ago

mmaelicke commented 3 years ago

This PR closes #156

Now the ImmutableResultSet.get_data method automatically merges Composite datasets. For non-compisites, the merge can be requested by the merge=True attribute.

In a first test it worked great for the Eddy composite:

image

Note the shape of the output DataFrame.

@AlexDo1 can you pull this branch and run it on the updated Eddy branch in scripts, which does create the composite. If everything works fine for you, you can approve this PR.

mmaelicke commented 3 years ago

I am just realizing, that there are a lot of NaNs in the DataFrame. Does this come from the slightly different timestamps? @AlexDo1 do you have an idea?

mmaelicke commented 3 years ago

And the tests are failing for the exisiting composite dataset tests. I'll have to check that

AlexDo1 commented 3 years ago

Really good I looked at the data again. I had accidentally selected T_begin instead of T_end as tstamp for the Eddy record. This caused the eddy data to be 30 minutes off compared to the other records, I fixed that.

The number of NaN values is very high in this data set. I have replaced all values < -9999 with NaN before uploading.

Raw data: number of values < -9999 is 1528364, number of NaN is 285 --> 1528649 Uploaded Metacatalog data: number of NaN: 1528649

mmaelicke commented 3 years ago

Really good I looked at the data again. I had accidentally selected T_begin instead of T_end as tstamp for the Eddy record. This caused the eddy data to be 30 minutes off compared to the other records, I fixed that.

Is that fixed in the scripts repo? So with the next upload it will work as expected?

The number of NaN values is very high in this data set. I have replaced all values < -9999 with NaN before uploading.

Raw data: number of values < -9999 is 1528364, number of NaN is 285 --> 1528649 Uploaded Metacatalog data: number of NaN: 1528649 Alright. Then, if the image above looks good to you, I guess I will sort out the failing tests and merge this PR afterwards... Thank you Alex

AlexDo1 commented 3 years ago

Really good I looked at the data again. I had accidentally selected T_begin instead of T_end as tstamp for the Eddy record. This caused the eddy data to be 30 minutes off compared to the other records, I fixed that.

Is that fixed in the scripts repo? So with the next upload it will work as expected?

This is fixed in the latest commit in the eddy_upload branch, yes.

codecov[bot] commented 3 years ago

Codecov Report

Merging #158 (d4ea2fc) into master (42aeb1f) will increase coverage by 0.01%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #158      +/-   ##
==========================================
+ Coverage   63.47%   63.49%   +0.01%     
==========================================
  Files          66       66              
  Lines        3069     3087      +18     
==========================================
+ Hits         1948     1960      +12     
- Misses       1121     1127       +6     
Flag Coverage Δ
e2e 63.49% <100.00%> (+0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
metacatalog/util/results.py 81.45% <100.00%> (-2.01%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 42aeb1f...d4ea2fc. Read the comment docs.