eurec4a / flight-phase-separation

Collection of manually edited flight segments for all platforms participating in EUREC4A.
1 stars 6 forks source link

New qc #22

Closed Geet-George closed 3 years ago

Geet-George commented 3 years ago

The earlier version sequenced the sondes as per time in the file name, i.e. in the order that the file was stored. However, for sondes that did not detect a launch, the launch_time and the file time is different. For these sondes, the time in the file name is actually time of initialization, because the file saved a few minutes before launch is retained as the final file. JOANNE changed the way the sondes are sequenced and so, now sonde_id is in the order of actual launch_time as opposed to time of file.

This PR adds the modified sondes.yaml to the repository, plus includes the modified flight-phase-files by running the script attach-sondes.py. The SAM sondes irregularities are reverted back to their intended states, which changes upon running the attach-sondes.py script.

Geet-George commented 3 years ago

@d70-t could you help me understand where the tests are failing? I realise that these are massive changes and so there are bound to be errors in testing, but it would be good to know where this is happening. Currently the only detail I get is: Error: Process completed with exit code 1.

d70-t commented 3 years ago

Puh... these are a lot of changes. So for the failing tests, there are a couple of warnings in the lines above the failing tests. These are the same warnings which should also appear if you generate the HTML report as we did during flight phase segmentation. The test will succeed if all warnings are resolved.

Regarding the changes, I'd really like to urge you to consolidate the commits such that

This will help a lot during the review to find what is actually a meaningful change and what doesn't change anything.

Then there's the problem with renumbering sondes. The purpose of an identifier is that is uniquely identifies exactly one thing. Thus once a sonde is associated with an identifier, this association may never be removed (or even worse changed to something else). The reason behind this is that as soon as the data is used by someone, the user will (rightfully) expect that the identifier can be used to identify the thing. This also is the reason why we don't use launch_time as an identifier, because that is not unique. One thing which makes things more complicated is if people start to interpret semantics onto identifiers. In particular, the number within a sonde id may never be used to infer any kind of ordering, because that is not what the number is for, because if people start to add semantics which aren't really there, this creates the desire to change identifiers and that totally defeats the purpose of an identifier.

Thus, there may be any sequence in which someone may want to order sondes, but a change in the "default order" may never influence the sonde ids and sonde ids may never be swapped over time.