enram / data-repository

Data quality assessment
https://enram.github.io/data-repository/
MIT License
3 stars 1 forks source link

Repeated data attributes #16

Closed peterdesmet closed 7 years ago

peterdesmet commented 8 years ago

Hi @adokter, in your specification you mention:

... --> only changing those variables that are different w.r.t. /dataset1/data1

Which is about these attributes:

/dataset1/data1/what/gain       1.0             [Double]    #Gain
/dataset1/data1/what/offset     0.0             [Double]    #Offset
/dataset1/data1/what/nodata     NaN             [Double]    #Nodata indicator
/dataset1/data1/what/undetect   9999.0          [Double]    #Undetect indicator

Questions:

  1. Does that mean that the values of those attributes are the same for data1 to data15, and thus do not need to be listed 15 times?
  2. Are the values of those attributes the same for all files from the same radar?
  3. Are the values of those attributes the same for all files?

I'm asking to know if and how we should store these in the database.

adokter commented 8 years ago

These values encode the data values, so the quantity value equals offset+gain*(data value).

It's a requirement by the ODIM format to use such encoding, but I currently do not use it, i.e. I use offset=0 and gain=1 for all quantity fields. My suggestion would be to use the offset and gain attributes to decode the field value according to the formula above, and only store the decoded value in the database, not the offset and gain attributes themselves.

  1. no they can be different in theory, so to be ODIM complient you have to take the values into account
  2. yes
  3. yes

NB We recently found out that we cannot use NaN values, so both nodata and undetect will be a numeric value.

peterdesmet commented 8 years ago

Ok, so if I understand correctly:

  1. Offset and gain affect the data value, so for a dens of 59.260143 you actually have to do offset + gain * 59.260143. As there is no need to keep offset and gain separately, we can actually do this calculation when writing data to the database, no longer keeping offset and gain, but only the "new" value?
  2. Currently all offset is 0.0 and all gain is 1.0, leaving the original value unchanged. But, it's better to use the calculation anyway, in case those values do get updated?
  3. Offset and gain can be different for each variable, but they are constant across radars and time? That means if you would change offset for e.g. dens, you would change it for all radars and dens measurements? Anyway, this answer doesn't really affect anything if we run all data through the calcuation anyway.
peterdesmet commented 8 years ago

Pending your answer, I assume gain and offset can be combined in value, but what regarding nodata and nodetect?

adokter commented 8 years ago
  1. correct
  2. correct
  3. Yes they are set in the bird algorithm, so will be the same for all radars and times, unless we decide to change the code. But changes to the code will likely happen in the future, so better to use offset and gain and decode it before storing in db
adokter commented 8 years ago

regarding nodata and undetect, yes both these values should be kept. But it doesn't matter which bit-values you decide to reserve for these two, as long as they are distinguishable. So if (in theory) different source files have different nodata and undetect codings, you can map them to the same respective bit-values that you reserved for these two cases in the db. So in the whole db only one nodata and one undetect value is needed that can be used for all fields, radars and times.

peterdesmet commented 8 years ago

Great, so nodata and undetect applies to the whole database.

  1. Should we even store these values? You can find it easily in any source file and the database will always be a derivative of the source files.
  2. If we DO store it, what should happen if the upload script detects a change in these values for one of the source files?
peterdesmet commented 7 years ago

We decided not to develop a data model and database for bird profile data, since there was no strong use case to do so, but rather extend the bioRad package to download and load bird profiles directly from the data repository on S3. Closing this issue.