Closed leifdenby closed 9 months ago
Hi,
b
files are files with the focus on backscatter
and water vapour
, while t
files contain the temperature profiles.
Thanks, @observingClouds. Parameterizing the catalog certainly makes it smaller but also somewhat more opaque. For the P3 data I am likely to keep the files explicit.
I'd also suggest to formulate the catalog more explicitly. In the end, the catalog might be what is used to generate overview pages about the available datasets, so the catalog should include enough information to understand what's inside the dataset and to discover all data from looking at the catalog file alone. In stead of keeping the catalog file small, I'd rather suggest to generate the catalog files using a script if maintaining it manually would be too tedious.
@observingClouds Do we want to close this as stale?
@leifdenby you put already all the information together. Would you mind, just splitting the b
and t
datasets in separate entries?
@ninarobbins would also be a good contributor here who could help us add the right metadata to the b
and t
dataset.
@RobertPincus I haven't lost the hope yet 🤣 @leifdenby will proof me that I'm correct 😜
Hi, I hope I can provide some clarification about the b
and t
files of the lidars. It is the same idea for both CORAL and LICHT.
The processing from Level 0 to Level 1 of the Raman lidar data results in two products: slow (t
) and fast (b
). The data in the slow product is smoothed in time over the temperature smoothing window; this window is by default 118 min for LICHT or 60 min for low resolution CORAL data. This slow product contains the temperature data, but also water vapor smoothed over this longer window. The fast product is smoothed in time over the (shorter) window specified for the rest of variables (default is 2min for LICHT and low resolution CORAL data, which is the time interval of the Level 0 data); this fast product contains the backscatter data and also the water vapor smoothed over the shorter window.
Both of these smoothing intervals can be specified by the user in the configuration file when doing the processing, and each run of the processing code that converts Level_0 data to Level_1 results in a slow and a fast product.
I hope that helps!
Thanks @ninarobbins that is very helpful. Just one more question for clarification. The fast product only contains quantities that can be retrieved at the fast speed, while the slow product includes quantities that need a longer integration time, i.e. everything that is connected with the temperature retrieval, correct? WaterVaporMixingRatio
is just in both datasets because it forms the basis of the relative humidity retrieval?
import intake
cat = intake.open_catalog("https://raw.githubusercontent.com/leifdenby/eurec4a-intake/meteor-licht-lidar/catalog.yml")
b = cat.ships.meteor.LICHT_LIDAR.to_dask()
t = cat.ships.meteor.LICHT_LIDAR(content_type='t').to_dask()
b.data_vars
#Data variables:
# Altitude (Length) float32 dask.array<chunksize=(484,), meta=np.ndarray>
# VerticalResolution (Length) float32 dask.array<chunksize=(484,), meta=np.ndarray>
# UnixTime (Time) int32 dask.array<chunksize=(720,), meta=np.ndarray>
# Backscatter532 (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
# ErrorBackscatter532 (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
# ParticleLinearDepolarisationRatio (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
# ErrorParticleLinearDepolarisationRatio (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
# VolumeLinearDepolarisationRatio (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
# ErrorVolumeLinearDepolarisationRatio (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
# Backscatter355 (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
# ErrorBackscatter355 (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
# CloudMask_float (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
# CloudMask (Length, Time) int32 dask.array<chunksize=(484, 720), meta=np.ndarray>
# WaterVaporMixingRatio (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
# ErrorWaterVapor (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
t.data_vars
#Data variables:
# Altitude (Length) float32 dask.array<chunksize=(484,), meta=np.ndarray>
# AltitudeGradients (Length_gradients) float32 dask.array<chunksize=(483,), meta=np.ndarray>
# VerticalResolution (Length) float32 dask.array<chunksize=(484,), meta=np.ndarray>
# UnixTime (Time) int32 dask.array<chunksize=(720,), meta=np.ndarray>
# Temperature355 (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
# ErrorTemperature355 (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
# TemperatureGradients355 (Length_gradients, Time) float32 dask.array<chunksize=(483, 720), meta=np.ndarray>
# ErrorTemperatureGradients355 (Length_gradients, Time) float32 dask.array<chunksize=(483, 720), meta=np.ndarray>
# WaterVaporMixingRatio (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
# ErrorWaterVapor (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
# RelativeHumidity355 (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
# ErrorRelativeHumidity355 (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
@observingClouds yes, I believe that's right!
@leifdenby you put already all the information together. Would you mind, just splitting the
b
andt
datasets in separate entries?
Yup! I can get that done :) It might have to wait till the end of the week, but I'll put it on my TODO list.
Hi @leifdenby, I hope it's okay that I took over your branch here. I applied kerchunk now to make the dataset a bit more user-friendly. There are however quite some factors that make this dataset challenging:
Length, Time
which is not standardIt is probably possible to fix these issues within the reference file as well but it would be great to fix all these issues in the original dataset. For now I think this is the best we can do. If you like my current solution I will try to add the reference files to https://observations.ipsl.fr/aeris/eurec4a-data/SHIPS/RV-METEOR/Raman_Lidar_LICHT/version_2020.07.31/nc/ and remove them from IPFS to keep things simple.
Just for future reference. I used this script to create the reference files.
@observingClouds You asked for a review. You want this now or to wait for the data set fixes you propose above?
@RobertPincus I don't expect the dataset issues to be fixed in the near-term future so I'm asking for a review of the current workaround.
Maybe this is a bit of an unfortunate timing, but as the TCO group is in the process to provide data online via zarr, the LICHT data on Meteor is now available on DKRZ's Swift store. See here for some initial documentation, or try the following:
import intake
cat = intake.open_catalog("https://tcodata.mpimet.mpg.de/catalog.yaml")
LICHT_b = cat.METEOR.EUREC4A.lidar_LICHT_LR_b_v1.to_dask()
LICHT_t = cat.METEOR.EUREC4A.lidar_LICHT_LR_t_v1.to_dask()
the data has been cleand up by @ninarobbins and should likely be preferred over older versions of the data. The data also has been rechunked and should load in a reasonable time.
@d70-t that's great. Thanks @ninarobbins to also reprocess the Meteor Lidar data. That is amazing!
Thanks for asking. Re-reading the wiki for LICHT and CORAL (https://wiki.mpimet.mpg.de/doku.php?id=analysis:data:bco:ramanlidars:raman-lidar-coral#data_access and https://wiki.mpimet.mpg.de/doku.php?id=analysis:data:bco:ramanlidars:raman-lidar-licht#data_access) I can see that for LICHT the
b
andt
options aren't detailed, but I assume they have the same meaning as for CORAL. I don't quite understand the difference though. So I'll email Ilya about this and update the description in the catalog and add that before merging this pull-request.