FDSN / miniSEED3

https://docs.fdsn.org/projects/miniseed3/
8 stars 3 forks source link

Proposal: Drop data publication version (header field 9) #6

Closed kaestli closed 1 year ago

kaestli commented 2 years ago

Reasoning: imagine a raw data set using publication version 1 Two different authors could pick this up, one author doing gapfilling, while the other doing time correction. Both, if unaware of each other, would increment the data publication version, resulting in two datasets carrying the same version, but having different content. This seems to be a potential source of confusion In a first order attempt to avoid the problem, one could consider adding a data field like "agency", and implying that among dataset versions of the same agency, version numbering should be consistent, while between dataset versions authored by different agencies, it may not. However, with the source identifier URI (and uri-style ...#fragment extensions of it), miniseed3 already provides a much more flexible & powerful tool for versioning and provenance indication). Thus, we think the error-prone version byte can be dropped (note, together with proposal #5, this leads to an unchanged header size

crotwell commented 2 years ago

Note that the spec says:

Values should only be considered relative to each other for data from the same data center.

and

Changes to this value for user-versioning are not recommended, instead an extra header could be used.

so authors should not use this field for processing steps. Each author using fields in the extra headers to store their own "agency" and version is the correct way to do this, and would as you suggest avoid the source of confusion. Perhaps this could be clearer in the documentation.

The purpose of this field, I believe, is to handle situations where a datacenter receives an update of the original data, so this is a "producer version" not a "consumer version". For example a user can see that she received version 5 of the waveform from data center X a month ago, but the current version at data center X is 6. While it does not say what changed, at least there is some indication that the original data producer needed to update it. This is similar to and replaces the D,R,M,Q system previously used in miniseed2.

djeastonca commented 1 year ago

It appears that, per Roman's update yesterday that ETH feedback on their remaining open issues has been recorded and that they are satisfied with the current draft of the data format, this issue can be closed