enram / data-repository

Data quality assessment
https://enram.github.io/data-repository/
MIT License
3 stars 1 forks source link

Investigate `_qcvol_` files #71

Open peterdesmet opened 1 year ago

peterdesmet commented 1 year ago

https://aloftdata.eu/browse/?prefix=baltrad/hdf5/bewid/2017/12/27/ contains:

bewid_vp_20171227T2100Z.h5
bewid_qcvol_20171227T2100Z.h5

The source data must have contained:

bewid_pvol_20171227T2100Z.h5
bewid_qcvol_20171227T2100Z.h5

The translation to vp data retains the file name, but only replaces _pvol_ with _vp_.

But now two file exist for the 21:00 timestamp. Would be good to investigate if they differ and how prevalent this type of file is.

BerendWijers commented 1 year ago

Hi,

I've looked into this. PVOL & QCVOL seemingly have the same dataset quantities (DBZH, VRADH, ..etc) but differ for their actual values within the quantity. h5diff will highlight these differences.

qcvol has extra quantities within a dataset. Conventionally, inside a PVOL datasetN contains dataN. qcvol also contains quality1,quality2,quality3.

Inside qcvol datasetN\dataN there is no such thing as a how metadata group. Therefore, the metadata of datasetN\dataN will look identical to it's pvol counterpart. However, inside the quality1 group is a metadata how which explains a specific task fi.fmi.ropo.detector.classification as well task_args which mention SPECKNORMOLD. quality2 and quality3 have different tasks than quality1.

Lastly, if we look at a rough plot of DBZH from PVOL and QCVOL the resulting image shows a likely filtered dataset in the QCVOL file.

pvol_vs_qcvol

I assume therefore that the QCVOL file is a filtered version of PVOL. And perhaps the QC in QCVOL stands for Quality Control.

To come back to the resulting VP's from PVOL vs QCVOL. I would estimate that the VP made from QCVOL would have less information than the VP from PVOL. Lastly, the resulting VP is named as type QCVOL instead of VP type. This might be an artifact from Radar Cluster where RC checks for the filetype. If it is not a PVOL it should throw an error. I've checked the QCVOL\what and the filetype is indeed PVOL. I assume that RC 's PVOL renaming might have missed QCVOL. I'll check in RC what the actual issue is. Regardless what RC does will not have an impact on data quality merely naming scheme.

peterdesmet commented 1 year ago

@BerendWijers thanks for looking into this. Do you think we can then safely remove all qcvol files in the bucket?