add LICHT LIDAR on R/V Meteor

leifdenby commented 4 years ago

How is a user to know from this which of b or t to choose? Is there some way to provide user feedback/info in the catalog?

Thanks for asking. Re-reading the wiki for LICHT and CORAL (https://wiki.mpimet.mpg.de/doku.php?id=analysis:data:bco:ramanlidars:raman-lidar-coral#data_access and https://wiki.mpimet.mpg.de/doku.php?id=analysis:data:bco:ramanlidars:raman-lidar-licht#data_access) I can see that for LICHT the b and t options aren't detailed, but I assume they have the same meaning as for CORAL. I don't quite understand the difference though. So I'll email Ilya about this and update the description in the catalog and add that before merging this pull-request.

observingClouds commented 3 years ago

Hi, b files are files with the focus on backscatter and water vapour, while t files contain the temperature profiles.

RobertPincus commented 3 years ago

Thanks, @observingClouds. Parameterizing the catalog certainly makes it smaller but also somewhat more opaque. For the P3 data I am likely to keep the files explicit.

d70-t commented 3 years ago

I'd also suggest to formulate the catalog more explicitly. In the end, the catalog might be what is used to generate overview pages about the available datasets, so the catalog should include enough information to understand what's inside the dataset and to discover all data from looking at the catalog file alone. In stead of keeping the catalog file small, I'd rather suggest to generate the catalog files using a script if maintaining it manually would be too tedious.

RobertPincus commented 1 year ago

@observingClouds Do we want to close this as stale?

observingClouds commented 1 year ago

@leifdenby you put already all the information together. Would you mind, just splitting the b and t datasets in separate entries?

@ninarobbins would also be a good contributor here who could help us add the right metadata to the b and t dataset.

@RobertPincus I haven't lost the hope yet 🤣 @leifdenby will proof me that I'm correct 😜

ninarobbins commented 1 year ago

Hi, I hope I can provide some clarification about the b and t files of the lidars. It is the same idea for both CORAL and LICHT.

The processing from Level 0 to Level 1 of the Raman lidar data results in two products: slow (t) and fast (b). The data in the slow product is smoothed in time over the temperature smoothing window; this window is by default 118 min for LICHT or 60 min for low resolution CORAL data. This slow product contains the temperature data, but also water vapor smoothed over this longer window. The fast product is smoothed in time over the (shorter) window specified for the rest of variables (default is 2min for LICHT and low resolution CORAL data, which is the time interval of the Level 0 data); this fast product contains the backscatter data and also the water vapor smoothed over the shorter window.

Both of these smoothing intervals can be specified by the user in the configuration file when doing the processing, and each run of the processing code that converts Level_0 data to Level_1 results in a slow and a fast product.

I hope that helps!

observingClouds commented 1 year ago

Thanks @ninarobbins that is very helpful. Just one more question for clarification. The fast product only contains quantities that can be retrieved at the fast speed, while the slow product includes quantities that need a longer integration time, i.e. everything that is connected with the temperature retrieval, correct? WaterVaporMixingRatio is just in both datasets because it forms the basis of the relative humidity retrieval?

import intake
cat = intake.open_catalog("https://raw.githubusercontent.com/leifdenby/eurec4a-intake/meteor-licht-lidar/catalog.yml")
b = cat.ships.meteor.LICHT_LIDAR.to_dask()
t = cat.ships.meteor.LICHT_LIDAR(content_type='t').to_dask()

b.data_vars
#Data variables:
#    Altitude                                (Length) float32 dask.array<chunksize=(484,), meta=np.ndarray>
#    VerticalResolution                      (Length) float32 dask.array<chunksize=(484,), meta=np.ndarray>
#    UnixTime                                (Time) int32 dask.array<chunksize=(720,), meta=np.ndarray>
#    Backscatter532                          (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    ErrorBackscatter532                     (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    ParticleLinearDepolarisationRatio       (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    ErrorParticleLinearDepolarisationRatio  (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    VolumeLinearDepolarisationRatio         (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    ErrorVolumeLinearDepolarisationRatio    (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    Backscatter355                          (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    ErrorBackscatter355                     (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    CloudMask_float                         (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    CloudMask                               (Length, Time) int32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    WaterVaporMixingRatio                   (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    ErrorWaterVapor                         (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>

t.data_vars
#Data variables:
#    Altitude                      (Length) float32 dask.array<chunksize=(484,), meta=np.ndarray>
#    AltitudeGradients             (Length_gradients) float32 dask.array<chunksize=(483,), meta=np.ndarray>
#    VerticalResolution            (Length) float32 dask.array<chunksize=(484,), meta=np.ndarray>
#    UnixTime                      (Time) int32 dask.array<chunksize=(720,), meta=np.ndarray>
#    Temperature355                (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    ErrorTemperature355           (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    TemperatureGradients355       (Length_gradients, Time) float32 dask.array<chunksize=(483, 720), meta=np.ndarray>
#    ErrorTemperatureGradients355  (Length_gradients, Time) float32 dask.array<chunksize=(483, 720), meta=np.ndarray>
#    WaterVaporMixingRatio         (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    ErrorWaterVapor               (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    RelativeHumidity355           (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    ErrorRelativeHumidity355      (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>

ninarobbins commented 1 year ago

@observingClouds yes, I believe that's right!

leifdenby commented 1 year ago

@leifdenby you put already all the information together. Would you mind, just splitting the b and t datasets in separate entries?

Yup! I can get that done :) It might have to wait till the end of the week, but I'll put it on my TODO list.

observingClouds commented 9 months ago

Hi @leifdenby, I hope it's okay that I took over your branch here. I applied kerchunk now to make the dataset a bit more user-friendly. There are however quite some factors that make this dataset challenging:

[ ] several time dimensions with unsupported/non CF-conform units
[ ] very small chunks (1 per tilmestep) that result in a very large reference file (about 60MB -> compressed only about 4MB)
[ ] dimension order is mostly Length, Time which is not standard

It is probably possible to fix these issues within the reference file as well but it would be great to fix all these issues in the original dataset. For now I think this is the best we can do. If you like my current solution I will try to add the reference files to https://observations.ipsl.fr/aeris/eurec4a-data/SHIPS/RV-METEOR/Raman_Lidar_LICHT/version_2020.07.31/nc/ and remove them from IPFS to keep things simple.

observingClouds commented 9 months ago

Just for future reference. I used this script to create the reference files.

RobertPincus commented 9 months ago

@observingClouds You asked for a review. You want this now or to wait for the data set fixes you propose above?

observingClouds commented 9 months ago

@RobertPincus I don't expect the dataset issues to be fixed in the near-term future so I'm asking for a review of the current workaround.

d70-t commented 9 months ago

Maybe this is a bit of an unfortunate timing, but as the TCO group is in the process to provide data online via zarr, the LICHT data on Meteor is now available on DKRZ's Swift store. See here for some initial documentation, or try the following:

import intake
cat = intake.open_catalog("https://tcodata.mpimet.mpg.de/catalog.yaml")
LICHT_b = cat.METEOR.EUREC4A.lidar_LICHT_LR_b_v1.to_dask()
LICHT_t = cat.METEOR.EUREC4A.lidar_LICHT_LR_t_v1.to_dask()

the data has been cleand up by @ninarobbins and should likely be preferred over older versions of the data. The data also has been rechunked and should load in a reasonable time.

observingClouds commented 9 months ago

@d70-t that's great. Thanks @ninarobbins to also reprocess the Meteor Lidar data. That is amazing!

eurec4a / eurec4a-intake

add LICHT LIDAR on R/V Meteor #12