PollyNET / Pollynet_Processing_Chain

NRT lidar data processing program for multiwavelength polarization Raman lidar network (PollyNET)
https://polly.tropos.de/
GNU General Public License v3.0
20 stars 8 forks source link

Respect the timestamps - some data might be missing #105

Closed Ropzn closed 3 years ago

Ropzn commented 3 years ago

There is an issue which at least was present on Arielle during the mosaic cruise. Due to technical reasons sometimes some (30s) measurements are completely missing. But this is not displayed correctly by Picasso.

Example: 2019-11-13 check period 22:00 to 24:00

The first plot is from https://polly.tropos.de/archive_link/recent_plots_new/arielle/2019/11/13/2019_11_13_Wed_ARI_18_00_02_RH.png It does not show missing time bins

The second plot (sorry for different time and hight range) shows some missing time bins (each 10min appart). You also recognize in the first plot, that ten missing 30s-bins pile up to 5minutes and the plot ends at 23:55. (I think there were two more bins missing, which are not represented in fig2, since it happend before 21:00)

image from

  versus

missingTimeBins

This might often not be a big deal. But as soon as you are playing with time resolution or compare the pollydata with other data there might be a timeshift.

Technical Reason: Probably the measuring program could not access the nc-file while the nc-file was accessed by the backup-procedure which is scheduled every 10minutes. This issue is resolved ?? (ask Ronny) for arielle (by changing the cronjob/taskmanagertime for the backup-procedure by some seconds), and might have occured only on the arielle server due to slow hardware. (I have not noticed it for Tadjikistan system.)

Discussion: I'm not deep into picasso, so I don't know, wether this is just a plotting issue or if timestamps are not respected during the processing chain. Probably it was only this single system, which has/had this problem. But probably this might be also the reason behind issue #91 .....? So, one could argue for the future, that we only need a quality flag, to quickly intervene if something like this happens. To make it really error prone, one could either correct the data by inserting time bins with NaN-measurements (either the raw data or within picasso) or making the program respect time stamps.

Proposals:

  1. Include a quality check for missing time bins. (little effort)
  2. Correct the data within Picasso (little effort programming, but probably cpu consuming?)
  3. Correct the data on the server (not by Picasso) (Probably not the nice way, because one should not touch the raw data)
  4. Respect time stamps within Picasso (probably high effort in programming, but little cpu consuming)
Ropzn commented 3 years ago

Oh, its getting more confusing: The old program shows again some different plot :-S Is this one interpolated, or how does it deal with the missing time-bins?

http://pollyold.tropos.de/?p=bilder&lambda=3&Jahr=2019&Monat=11&Tag=13&Ort=35#bildanker

image

ZPYin commented 3 years ago

There is an issue which at least was present on Arielle during the mosaic cruise. Due to technical reasons sometimes some (30s) measurements are completely missing.

Thanks for reporting it. I think we can tranfer it to Ronny (@Ronny-TROPOS)

But this is not displayed correctly by Picasso.

Example: 2019-11-13 check period 22:00 to 24:00

The first plot is from https://polly.tropos.de/archive_link/recent_plots_new/arielle/2019/11/13/2019_11_13_Wed_ARI_18_00_02_RH.png It does not show missing time bins

Well, it's not a good demonstration with showing RH plot, since it was smoothed heavily both in temporal and spatial scale. I would suggest to use time-height plot of range-corrected signal instead.

But you are right, the timestamp is WRONG. But this was caused by the inappropriate polly configuration of flagCorrectFalseMShots. I guess this setting was true when running Picasso. If so, the timestamp will be re-processed according to the number of profiles and temporal resolution of 30 s. Under the case you showed, the number of profiles is insufficient (708 in total), which could finally lead to the wrong timestamps.

But if we switch flagCorrectFalseMShots to false, the timestamp will be directly loaded from the polly netCDF file. And this would be closer to the truth, see as below:

2019_11_13_Wed_ARI_18_00_02_RCS_FR_532

The second plot (sorry for different time and hight range) shows some missing time bins (each 10min appart). You also recognize in the first plot, that ten missing 30s-bins pile up to 5minutes and the plot ends at 23:55. (I think there were two more bins missing, which are not represented in fig2, since it happend before 21:00)

image from

  versus

missingTimeBins

This might often not be a big deal. But as soon as you are playing with time resolution or compare the pollydata with other data there might be a timeshift.

Let me know whether changing the polly configuration of flagCorrectFalseMShots to false can resolve this issue.

Technical Reason: Probably the measuring program could not access the nc-file while the nc-file was accessed by the backup-procedure which is scheduled every 10minutes. This issue is resolved ?? (ask Ronny) for arielle (by changing the cronjob/taskmanagertime for the backup-procedure by some seconds), and might have occured only on the arielle server due to slow hardware. (I have not noticed it for Tadjikistan system.)

Thanks for explaining it. It helps.

Discussion: I'm not deep into picasso, so I don't know, wether this is just a plotting issue or if timestamps are not respected during the processing chain. Probably it was only this single system, which has/had this problem. But probably this might be also the reason behind issue #91 .....? So, one could argue for the future, that we only need a quality flag, to quickly intervene if something like this happens. To make it really error prone, one could either correct the data by inserting time bins with NaN-measurements (either the raw data or within picasso) or making the program respect time stamps.

Proposals:

  1. Include a quality check for missing time bins. (little effort)

I think this can be done outside of Picasso. Since so far, I didn't see any issues for the data processing. The retrievals are not influenced by the missing profiles.

  1. Correct the data within Picasso (little effort programming, but probably cpu consuming?)

It can be corrected. But they are already many data correction inside Picasso. We need to think about the piled-up complexity by combining this issue. btw, how did EARLINET treat such missing profiles? Will they also add some dumy profiles to fill the missing columns? (@HolgerPollyNet )

  1. Correct the data on the server (not by Picasso) (Probably not the nice way, because one should not touch the raw data)

Yes, I agree. But this correction scheme can be invoked by Picasso when prepraring/unzipping the netCDF file. I would highly suggest to have a data FIXING script to generate standard polly data files (@HolgerPollyNet ).

This script would also be helpful for other users to just clean their data.

  1. Respect time stamps within Picasso (probably high effort in programming, but little cpu consuming)

This has been implemented. See my in-line answers above.