Closed eoydvin closed 1 year ago
Do I understand correctly that the current state of this PR does do all transformation of the OpenMRG dataset in the notebook to create the small example file?
That would mean that large parts of the code in opensense_data_downloader_and_transformer.py
will be completely renewed. Would be good if you move the code over there. But, from now on we should really aim to first do #22 so that we can correctly add version numbers. Only when we have versioned data_transformer functions we should use them to create an OpenMRG NetCDF with the OPENSENSE convenctions so that the colleagues can update it on zenodo. It has to be perfectly 100% clear how the transformation was done.
As we found yesterday, the rainfall fields in the latest version had a bit lower rainfall rates compared to one of the intermediate version which looked good, too.
Based on this time series that you show in the example notebook, I suspect that you forgot to interpolate very short gaps in the current version. Hope that brings back some of the peaks.
Based on this time series that you show in the example notebook, I suspect that you forgot to interpolate very short gaps in the current version.
Indeed, we did not interpolate gaps up to 5 min in the latestet appraoch to reshape the data with xarray. @eoydvin, can add following lines to thecreate_small_openMRG_example.ipynb
:ds["tsl"] = ds.tsl.interpolate_na(dim="time", method="linear", max_gap="5min") ds["rsl"] = ds.rsl.interpolate_na(dim="time", method="linear", max_gap="5min")
Thanks for this effort @eoydvin! Some comments:
create_small_openMRG_example.ipynb
:
opensense_data_downloader_and_transformer.py
in Explore_existing_datasets.ipynb
does not work as the oddt file lives in a parent dir. I suggest either something like a sys.path
import, or to move the oddt also to the data subdir (changing the import in the other notebooks). This second method would leave the notebook dir a bit cleaner and therefore IMO the nicer solution here.ds = ds.resample(time = '1T', label = 'right').mean()
in the create_small_openMRG_example. ipynb
to derive 1 min data from openMRG, i.e. I would prefer to use instantaneous 1 min values, but that is a minor detail. openMRG_use_case.ipynb
extend='both'
argument.create_small_openMRG_example. ipynb
. After thinking twice on the use case an alternative could be that we split the notebook in half, dubbing the first half 'simple use case' and the second half, 'extended use case with comparison to radar data' and let it all live there.
Thanks @eoydvin for the additional work and thanks @maxmargraf for the detailed comments. I have to catch up a bit and will provide my feedback soon.
In general, splitting up the use case, sounds like a good idea, i.e. one very basics example and a more complex one. But maybe this could also be one notebook, just with a more detailed part at the bottom.
First, it is most important to have one correct use case ;-)
@eoydvin Please note that the radar-CML animation in the openMRG_use_case does not work because it relies on the radar NetCDF which is downloaded in other notebooks:
FileNotFoundError: [Errno 2] No such file or directory: '/home/jovyan/OPENSENSE_sandbox/notebooks/data/andersson_2022_OpenMRG/radar/radar.nc'
Thanks for the feedback! Some notes:
Thanks for the commit! I tried several times to launch the binder, unsuccessfully. Therefore, I did not test this commit yet.
Do I understand it correctly that it will be moved to a new repository for version tracking later?
For this sandbox, the oddt code will stay there IMO.
should we change the simple use case interpolation to be in minutes instead of hours?
I would leave this in an hourly base covering a whole day
Should each case be able to run without the other?
Having it in one notebook, with some explanations is fine for me.
For me the binder loads quickly, but from the main branch without the new transformation code. I think I remember it was able to load from my branch before..? The binder URL seem to point to the main branch.
The binder URL seem to point to the main branch.
Note that the URL points to the main branch of the separate env repo, but the state of the branch of this PR is then pulled in when running on binder, see #31
I also could not start binder from this PR yesterday evening... Now it worked within some seconds...
I ran the notebook. The processing looks good now. And the final animation is really great. The animation code cells are very complex, though.
Three minor things that could be improved:
For the rest, I agree with what @maxmargraf wrote above
Ok, I will attempt at cleaning the code and write a user-friendly explanation. I did not quite get if we should move the transformation function up one level to the data directory or not?
I did not quite get if we should move the transformation function up one level to the data directory or not?
I would leave it where it is now. As you wrote somewhere above, some other notebooks expect it to be there. And everything in the sandbox is bit messy anyway. Soon we should do #22 and then things will be cleaned up anyway 🤞
Thanks for yet another update! Sounds like this could be the last commit in this PR 👍
Unfortunately, again binder does not start for me... I will check it out tomorrow morning and then finally merge!
One tiny detail. For the first animation the title of the plot has to be set like this
title.set_text("{}".format(str(cmls_R_1h.isel(time=n).time.values)[:19]))
and not using cmls.isel...
.
There is also a problem with the data exploration notebook, see #64, but we fix that later. It might depend a bit on what kind of binder pod you get...
I took another look. Everything is great now!
Thanks a lot @eoydvin for fixing the error and for taking the chance to improve and extend the usecase notebook! Also thanks to @maxmargraf for help with tracking down the error and for feedback in this PR! 👏 👏 👏
I am merging this now 🎉
There was a problem with the data transformation of the OpenMRG code (merged in #56) which led to wrong combinations of CML time series and metadata. This will be fixed in this PR.
Additionally, the use case example notebook will be extended.
Edit (by @cchwala ):