enram / vpts-csv

Data exchange format for biological signals detected by weather radars
https://aloftdata.eu/vpts-csv/
MIT License
3 stars 3 forks source link

Multiple questions about heights #11

Closed niconoe closed 2 years ago

niconoe commented 3 years ago

1) Do we guarantee to consumers that the same heights will be available for each timestamp? 2) Do we document the available heights in the metadata file? (the alternative is to let reader infer it from the content of the CSV file) 3) Is it "good enough" if the standard states heights are always expressed as meters above sea level (data type: positive integer)? 4) Is it better to call the column height or altitude?

peterdesmet commented 3 years ago
  1. @bart1 @adokter I'm guessing we only provide the heights that are there in the source vp data?
  2. Yesterday we decided to have as few as possible coverage statistics: they are better inferred by the user reading the data and are also very use case dependent. E.g. here, would that be all available heights or heights that have data for all timestamps?
  3. @adokter @CeciliaNilsson709 I'm guess that is already the case with heights right?
  4. I think the consensus is height even though altitude might be more "correct" @adokter @CeciliaNilsson709?
niconoe commented 3 years ago

A few more precisions and questions:

  1. indeed, if we choose to enforce this limitation in the standard, a converter such as vph5-to_vpts will have to refuse to work sometimes (because thesource vp data is not consistent in terms of height). The advantage is simplicity for consumer ( CROW for example would become much more complex if - when reading the data - it cannot guess which heights will be available for the next timestamps). There's some nuance here: a simple occasional height gap is easier to deal with than totally random heights for every timestamp.
  2. Yep, I quite like that too. Would you also take that approach for the temporal range (since the CSV is ordered, it can be simply inferred by reading the first and last line)
  3. Since we are discussing the standard here: I believe this is the case now, but do you expect that to stay the case if the standard is used in other contexts (different source data, users, ...)
peterdesmet commented 3 years ago
  1. What about listing all heights found in the data in the metadata, e.g. [0, 200, 400, 500, 600, 800, ...] (note the odd 500). Then at least a consumer knows beforehand what heights to expect or select. The data could have 1 or more of those heights for a timestamp. Would that help e.g. CROW?
  2. I'm tempted to retain the temporal range. It is a pretty basic Dublin Core term: example
niconoe commented 3 years ago

@peterdesmet: yep, all that seems reasonable to me!

adokter commented 3 years ago

https://github.com/enram/vpts/issues/11#issuecomment-859398426:

  1. agree only what's present in vp's. it might happen that vp's have different altitude specs, but to keep it simple I would require that the altitude specification doesn't change within a vpts data package.
  2. can be inferred from the data I would say. There has to be agreement though what the height indicates (e.g. bottom vs middle of the altitude bin)
  3. vol2bird gives altitude above mean sea level. However, cajun works with height above ground level. They can be converted into each other if you know the height of the antenna, which should be required metadata. So only height above sea level is "enough" because we can calculate the other from the antenna height.
  4. altitude is more accurate terminology, but we denote it as height in bioRad, as well as in ODIM h5 standard used by meteorologists, see https://github.com/adokter/bioRad/issues/78. I would therefore stick with height
niconoe commented 3 years ago

Thanks @adokter!

For this standard, what is your suggestion for the agreement on the height (bottom or middle of the bin?). For the first version, I suggest to align the standard as much as reasonably possible with vol2bird (for the same reason, I think documenting the fact that the height is actually above mean sea lebel)

adokter commented 3 years ago

difficult one to call, I see the practical advantage to sticking to the current vol2bird output (bottom of bin), but from an analysis point-of-view the center of the bin is more informative/intuitive.

jshamoun commented 3 years ago

I have been following the conversation silently. I would be careful about changing standards, but clear documentation should help. A problem may arise if people compare previously and recently processed data, something to keep in mind with other repositories like the one at UvA, or people that previously downloaded the ENRAM repository. Another option is to provide a conversion table between bottom and center of the bin. Regarding altitude or height, altitude is the correct term even if height is more commonly used.

CeciliaNilsson709 commented 3 years ago

I agree with Judy that I think we should stick to what we have been doing (bottom of bin and height rather then altitude) to avoid confusing missmatch with already existing data/processing. It is after all quite simple for the user to change to mid-bin etc themselves afterwards if they want. It also makes sense to me to stick to "height" as that is what is in the meteorological data, and renaming it might give the impression it has been changed.

My experience is that both the data and the user cases can vary quite widely, especially in terms of height coverage, time intervalls etc, so I would opt for keeping it flexible where possible and to make sure its all very well documented of course.

niconoe commented 3 years ago

Update: I think we can summarise the consensus like this (shout if you don't agree!):

I'll make sure this is all clearly reflected in the documentation/specifications then I'll close this issue.

peterdesmet commented 2 years ago

Based on the last comment, I have reflected this in the format as:

https://github.com/enram/vpts/blob/4f943a642660704d60fe4b39d171791e3e6b5f35/pages/format.md?plain=1#L15

bart1 commented 2 years ago

To be explicit should it be mentioned that they also have the same values? (not only that the number of values should match). Also note that it is not uncommon to encounter data where the height has been calculated to either 4 or 5 km and I have encountered mixing issue there

peterdesmet commented 2 years ago

To be explicit should it be mentioned that they also have the same values?

Ok, will do that.

Also note that it is not uncommon to encounter data where the height has been calculated to either 4 or 5 km and I have encountered mixing issue there

Can you clarify with an example?

bart1 commented 2 years ago

Here is a small example of heights going to 4000 and 5000 meters:

require(bioRad)
#> Loading required package: bioRad
#> Welcome to bioRad version 0.6.0
#> Docker daemon running, Docker functionality enabled (vol2bird version 0.5.0)
f<-function(x)
{
  download.file(x, t<-tempfile('.h5'))
#  browser()
  read_vpfiles(t)
}
h<-c('https://lw-enram.s3-eu-west-1.amazonaws.com/fr/nim/2017/11/24/02/frnim_vp_20171124T0215Z_0x7.h5',
     'https://lw-enram.s3-eu-west-1.amazonaws.com/fr/nim/2016/09/21/02/frnim_vp_20160921T0215Z.h5')
f(h[1])$attributes$where$maxheight
#> [1] 5000
f(h[2])$attributes$where$maxheight
#> [1] 4000

Created on 2022-05-10 by the reprex package (v2.0.1)

peterdesmet commented 2 years ago

@bart1 given that some radars have different max height over time, should we still add the requirement that the heights are constant across time?

adokter commented 2 years ago

As an aside: currently bioRad can't handle vpts objects that have different maximum heights or height intervals. But we could decide that we want to support that in the future. It will be quite a bit of work to implement.

peterdesmet commented 2 years ago

Currently we state:

Data SHOULD have the same heights for all datetimes of a radar.

SHOULD = This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.

I think that covers the use case we want to cover now.