OpenSenseAction / OPENSENSE_sandbox

Collection of runable examples with software packages for processing opportunistic rainfall sensors
BSD 3-Clause "New" or "Revised" License
13 stars 16 forks source link

fix OpenMRG data transformer and extend use case #62

Closed eoydvin closed 1 year ago

eoydvin commented 1 year ago

There was a problem with the data transformation of the OpenMRG code (merged in #56) which led to wrong combinations of CML time series and metadata. This will be fixed in this PR.

Additionally, the use case example notebook will be extended.

Edit (by @cchwala ):

github-actions[bot] commented 1 year ago

Binder :point_left: Launch a binder notebook on branch _eoydvin/OPENSENSEsandbox/main

cchwala commented 1 year ago

Do I understand correctly that the current state of this PR does do all transformation of the OpenMRG dataset in the notebook to create the small example file?

That would mean that large parts of the code in opensense_data_downloader_and_transformer.py will be completely renewed. Would be good if you move the code over there. But, from now on we should really aim to first do #22 so that we can correctly add version numbers. Only when we have versioned data_transformer functions we should use them to create an OpenMRG NetCDF with the OPENSENSE convenctions so that the colleagues can update it on zenodo. It has to be perfectly 100% clear how the transformation was done.

cchwala commented 1 year ago

As we found yesterday, the rainfall fields in the latest version had a bit lower rainfall rates compared to one of the intermediate version which looked good, too.

Based on this time series that you show in the example notebook, I suspect that you forgot to interpolate very short gaps in the current version. Hope that brings back some of the peaks.

Bildschirmfoto 2023-06-30 um 07 55 49
maxmargraf commented 1 year ago

Based on this time series that you show in the example notebook, I suspect that you forgot to interpolate very short gaps in the current version.
Indeed, we did not interpolate gaps up to 5 min in the latestet appraoch to reshape the data with xarray. @eoydvin, can add following lines to the create_small_openMRG_example.ipynb:

ds["tsl"] = ds.tsl.interpolate_na(dim="time", method="linear", max_gap="5min")
ds["rsl"] = ds.rsl.interpolate_na(dim="time", method="linear", max_gap="5min")
eoydvin commented 1 year ago
maxmargraf commented 1 year ago

Thanks for this effort @eoydvin! Some comments:

create_small_openMRG_example.ipynb:

openMRG_use_case.ipynb

After thinking twice on the use case an alternative could be that we split the notebook in half, dubbing the first half 'simple use case' and the second half, 'extended use case with comparison to radar data' and let it all live there.

cchwala commented 1 year ago

Thanks @eoydvin for the additional work and thanks @maxmargraf for the detailed comments. I have to catch up a bit and will provide my feedback soon.

In general, splitting up the use case, sounds like a good idea, i.e. one very basics example and a more complex one. But maybe this could also be one notebook, just with a more detailed part at the bottom.

First, it is most important to have one correct use case ;-)

cchwala commented 1 year ago

@eoydvin Please note that the radar-CML animation in the openMRG_use_case does not work because it relies on the radar NetCDF which is downloaded in other notebooks:

FileNotFoundError: [Errno 2] No such file or directory: '/home/jovyan/OPENSENSE_sandbox/notebooks/data/andersson_2022_OpenMRG/radar/radar.nc'
eoydvin commented 1 year ago

Thanks for the feedback! Some notes:

maxmargraf commented 1 year ago

Thanks for the commit! I tried several times to launch the binder, unsuccessfully. Therefore, I did not test this commit yet.

Do I understand it correctly that it will be moved to a new repository for version tracking later?

For this sandbox, the oddt code will stay there IMO.

should we change the simple use case interpolation to be in minutes instead of hours?

I would leave this in an hourly base covering a whole day

Should each case be able to run without the other?

Having it in one notebook, with some explanations is fine for me.

eoydvin commented 1 year ago

For me the binder loads quickly, but from the main branch without the new transformation code. I think I remember it was able to load from my branch before..? The binder URL seem to point to the main branch.

cchwala commented 1 year ago

The binder URL seem to point to the main branch.

Note that the URL points to the main branch of the separate env repo, but the state of the branch of this PR is then pulled in when running on binder, see #31

I also could not start binder from this PR yesterday evening... Now it worked within some seconds...

cchwala commented 1 year ago

I ran the notebook. The processing looks good now. And the final animation is really great. The animation code cells are very complex, though.

Three minor things that could be improved:

  1. Add two or three sentences below each header, e.g. explaining that the following cell will do complex things in matplotlib which are only needed for visualisation and are not important to understand CML rainfall estimation.
  2. Look for typos, e.g. in the second heading the first "data" has to be deleted
  3. If possible, without spending a full day on it (😇), it would be great to not squash the CML rainfall map when adding the colorbar in the final animation. A solution could be based on info from the matplotlib docs or from this stackoverflow response. Since the final animation looks so good, I would add it as gif on top of the READEM as a teaser for the sandbox content. Hence, spending some more time on it would be worth it.

For the rest, I agree with what @maxmargraf wrote above

eoydvin commented 1 year ago

Ok, I will attempt at cleaning the code and write a user-friendly explanation. I did not quite get if we should move the transformation function up one level to the data directory or not?

cchwala commented 1 year ago

I did not quite get if we should move the transformation function up one level to the data directory or not?

I would leave it where it is now. As you wrote somewhere above, some other notebooks expect it to be there. And everything in the sandbox is bit messy anyway. Soon we should do #22 and then things will be cleaned up anyway 🤞

cchwala commented 1 year ago

Thanks for yet another update! Sounds like this could be the last commit in this PR 👍

Unfortunately, again binder does not start for me... I will check it out tomorrow morning and then finally merge!

cchwala commented 1 year ago

One tiny detail. For the first animation the title of the plot has to be set like this

title.set_text("{}".format(str(cmls_R_1h.isel(time=n).time.values)[:19]))

and not using cmls.isel....

cchwala commented 1 year ago

There is also a problem with the data exploration notebook, see #64, but we fix that later. It might depend a bit on what kind of binder pod you get...

cchwala commented 1 year ago

I took another look. Everything is great now!

Thanks a lot @eoydvin for fixing the error and for taking the chance to improve and extend the usecase notebook! Also thanks to @maxmargraf for help with tracking down the error and for feedback in this PR! 👏 👏 👏

I am merging this now 🎉