MeteoSwiss / dvas

Data Visualization and Analysis Software for the UAII 2022
https://meteoswiss.github.io/dvas/
GNU General Public License v3.0
3 stars 0 forks source link

How to distinguish the serial number (must be unique in the db) between a GDP and a native profile #91

Closed GonzagueRomanens closed 3 years ago

GonzagueRomanens commented 3 years ago

Question: @fpavogt, @modolol Considering a radiosonde which is at the same time a GDP and a native product. the serial number is the same. But, the SN must be unique in the DB. For the SampleDatasetMultipayload in instrument_tests.yml i just added the prefix test exemple:

I guess a more elegant solution could be found. Any proposition?

fpavogt commented 3 years ago

Currently, the SRN is indeed assumed to be unique: it also implies that two profiles with the same SRN are assumed to be the same by dvas. This is particularly important (for example) when computing correlations. If we allow two profiles to have the same SRN, then how do we identify unique profiles ?

It would be possible to check both the SRN and the 'gdp' tag at the same time to avoid confusion. This approach would however "limit" us to having only 2 profiles with the same SRN: a 'gdp' one, and another one. Unless we add more tags of course, but that would still be up to the user to do, and not ideal IMHO.

On the other hand, changing the SRN is great, because one could add as many copies of the same profile (including many GDPs) as required (if we wanted to compare e.g. 10 different versions of the algorithm). The one downside, of course, is the manual modification of the SRN. But once we have an automated tool to assemble all the DB config files, this would only require to modify the raw frames. Which I do not think would be too painful to do (and could also easily be automated).

The best could actually be to tweak the content of the SRN. dvas doesn't actually care about this number: the code just requires a unique ID for each profile. Right now, it is taken to be the SRN of the radiosonde, which we assumed to be unique. Could we concatenate something to it, to turn it into a true PID=profile ID, and make it more suited to the task ?

We need something found in the original netCDF files for automation purposes. We have the file extension of course (.nc, .txt, .csv). But also maybe g.Product.Id: 560920 (and/or g.Product.FullKey: RS41-GDP-BETA.1, g.Product.Key: RS41-GDP-BETA, g.Product.Level: 2, g.File.Type: GNC-DATA) ?

Proposition:

We replace srn in dvas with a profile ID pid variable, constructed as

pid = srn + _ + product_id + _ + product_level (2=processed, 0/1 meaning to be check with GRUAN)

For the DB it doesn't change anything (this str is still unique). We just need the mechanism to assemble it at load time. For raw frames, the users add both the `srn and product_id to the file, and they can change the later to whatever they need. The product_level is set automatically to the appropriate value for raw files.

Thoughts ?

modolol commented 3 years ago

My remarks/proposals: -) Yes, we must be able to enter the same SRN several times. -) The currently used hash allows to manage the uniqueness of profiles. It is not necessary to introduce other concepts because this one is sufficient. I think we should force the user to play on the tags to find the profiles. -) The implementation is in my opinion simple. You just have to replace the SRN in the event manager by the ID instrument. The possibility to search by SRN should remain (to be checked), but with the "risk" of getting duplicates if the user is not aware of what he is doing and does not tag his profiles correctly.