Open crotwell opened 7 years ago
Hi Philip,
Regarding unit name consistency, at the IRIS DMC were recently wrangling with this while specifying test cases for a metadata validator we are developing. In my opinion StationXML affords us the opportunity to begin addressing this issue, so what we've tried to come up with is a practical transition path to using proper SI units and grammar. The rule set for our validator can be found here:
Keep in mind that these are rules for our StationXML validator that will be used for metadata submitted to the IRIS DMC and are subject to change. They are not (yet) a proposal to the FDSN. I think it would be great for the FDSN to adopt some kind rules set like this and enforce it in the schema (as possible).
Also note that the validator in the project is unfinished and unreleased, i.e. you probably don't want to use it.
regards, Chad
On Jun 5, 2017, at 8:55 AM, Philip Crotwell notifications@github.com wrote:
This is from an old post to iris web services list, but I just stumbled into this issue again, so just putting it here so not forgotten if/when there is a stationxml revision. http://ds.iris.edu/message-center/thread/1450/ http://ds.iris.edu/message-center/thread/1450/ Are there any guildlines for how the name of a unit in FDSNStationXML should be formed? Other than "do it like SEED"?
I know the unit says it is the same as SEED blockette 34, but the SEED spec says use SI but use all uppercase, which contradicts the SI convention that case matters. So for example with prefixes m is milli and M is mega, and for units g is gram while G is gauss and s is second while S is siemens. I suppose most of seismology is covered by volt, meter, second and count, but there are more and more types of data begin recorded at seismic stations and so more varieties of units we need to support. And at some point it would be nice to get away from a "formatted as FORTRAN-like equations with all alphabetic characters in upper case" way of writing units and make us of the existing standards for better portability and exchange within and outside of seismology. It seems sad that the units are still just unstructured strings that make it challenging for code to parse and correctly interpret. Following something like this might be useful: http://physics.nist.gov/cuu/Units/ http://physics.nist.gov/cuu/Units/ Follow up question, there are seemingly two types of "units" in FDSNStationXML, one used as an element and one as an attribute. The element one, UnitType, appears to say follow the SEED convention, so you have units with names like M/S, M and V. The unit attribute on FloatType is just a string with no documentation on how it is to be used, but looking at concrete uses of it in for example SecondType, VoltageType and DistanceType, it appears that the string should be things like SECONDS, VOLTS and METERS. So we have two different ways of specifying units in FDSNStationXML with different naming conventions.
Perhaps even more confusing, the SampleRateType specifies the fixed unit string as SAMPLES/S, and which combines both unit naming conventions.
Can you clarify the unit naming scheme? I would like to be able to parse the units, but it is much harder if there is not a clear mapping from unit to/from strings.
Is this something that might be unified/simplified in a future version?
Also, should Delay and Correction in DecimationType be SecondType? Amplitude in ResponseListElement and Numerator and Denominator in CoefficientsType be FloatNoUnitType as the units are in the enclosing element? NumeratorCoefficient be FloatNoUnitType (or FloatType) to be like CoefficientsType?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/FDSN/StationXML/issues/14, or mute the thread https://github.com/notifications/unsubscribe-auth/AP9hagXTpX2zkU-N7ji6JgDAuAxjY0fiks5sBCTfgaJpZM4NwO39.
+1, adopting this as part of the stationxml spec would be very nice.
Small thing, but "counts" violates "Unit definitions are singular, not plural."
Philip
On Mon, Jun 5, 2017 at 3:55 PM, Chad Trabant notifications@github.com wrote:
Hi Philip,
Regarding unit name consistency, at the IRIS DMC were recently wrangling with this while specifying test cases for a metadata validator we are developing. In my opinion StationXML affords us the opportunity to begin addressing this issue, so what we've tried to come up with is a practical transition path to using proper SI units and grammar. The rule set for our validator can be found here:
https://github.com/iris-edu/StationXML-Validator/wiki/ Unit-name-overview-for-IRIS-StationXML-validator
Keep in mind that these are rules for our StationXML validator that will be used for metadata submitted to the IRIS DMC and are subject to change. They are not (yet) a proposal to the FDSN. I think it would be great for the FDSN to adopt some kind rules set like this and enforce it in the schema (as possible).
Also note that the validator in the project is unfinished and unreleased, i.e. you probably don't want to use it.
regards, Chad
On Jun 5, 2017, at 8:55 AM, Philip Crotwell notifications@github.com wrote:
This is from an old post to iris web services list, but I just stumbled into this issue again, so just putting it here so not forgotten if/when there is a stationxml revision. http://ds.iris.edu/message-center/thread/1450/ < http://ds.iris.edu/message-center/thread/1450/> Are there any guildlines for how the name of a unit in FDSNStationXML should be formed? Other than "do it like SEED"?
I know the unit says it is the same as SEED blockette 34, but the SEED spec says use SI but use all uppercase, which contradicts the SI convention that case matters. So for example with prefixes m is milli and M is mega, and for units g is gram while G is gauss and s is second while S is siemens. I suppose most of seismology is covered by volt, meter, second and count, but there are more and more types of data begin recorded at seismic stations and so more varieties of units we need to support. And at some point it would be nice to get away from a "formatted as FORTRAN-like equations with all alphabetic characters in upper case" way of writing units and make us of the existing standards for better portability and exchange within and outside of seismology. It seems sad that the units are still just unstructured strings that make it challenging for code to parse and correctly interpret. Following something like this might be useful: http://physics.nist.gov/cuu/Units/ http://physics.nist.gov/cuu/Units/ Follow up question, there are seemingly two types of "units" in FDSNStationXML, one used as an element and one as an attribute. The element one, UnitType, appears to say follow the SEED convention, so you have units with names like M/S, M and V. The unit attribute on FloatType is just a string with no documentation on how it is to be used, but looking at concrete uses of it in for example SecondType, VoltageType and DistanceType, it appears that the string should be things like SECONDS, VOLTS and METERS. So we have two different ways of specifying units in FDSNStationXML with different naming conventions.
Perhaps even more confusing, the SampleRateType specifies the fixed unit string as SAMPLES/S, and which combines both unit naming conventions.
Can you clarify the unit naming scheme? I would like to be able to parse the units, but it is much harder if there is not a clear mapping from unit to/from strings.
Is this something that might be unified/simplified in a future version?
Also, should Delay and Correction in DecimationType be SecondType? Amplitude in ResponseListElement and Numerator and Denominator in CoefficientsType be FloatNoUnitType as the units are in the enclosing element? NumeratorCoefficient be FloatNoUnitType (or FloatType) to be like CoefficientsType?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/FDSN/StationXML/issues/14>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AP9hagXTpX2zkU- N7ji6JgDAuAxjY0fiks5sBCTfgaJpZM4NwO39>.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/FDSN/StationXML/issues/14#issuecomment-306286149, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHxh8UWU4P2sR7iMcv3o9QWakNirFu9ks5sBF1NgaJpZM4NwO39 .
On Jun 5, 2017, at 1:06 PM, Philip Crotwell notifications@github.com wrote:
+1, adopting this as part of the stationxml spec would be very nice.
Small thing, but "counts" violates "Unit definitions are singular, not plural."
Indeed, but it is absolutely pervasive in SEED metadata so we landed on leaving it as an exception as a practical matter.
Chad
In my sta2extStationXML package, I added a script to clean up unit names based on the list from the iris validator. Might be useful, see cleanUnitNames.py
grep fixed fdsn-station-1.0.xsd
fixed="SECONDS/SAMPLE"/>
<xs:attribute name="unit" type="xs:string" fixed="SECONDS"/>
<xs:attribute name="unit" type="xs:string" fixed="VOLTS"/>
<xs:attribute name="unit" type="xs:string" use="optional" fixed="DEGREES"/>
<xs:attribute name="unit" type="xs:string" use="optional" fixed="DEGREES"/>
<xs:attribute name="unit" type="xs:string" use="optional" fixed="DEGREES"/>
<xs:attribute name="unit" type="xs:string" use="optional" fixed="DEGREES"/>
<xs:attribute name="unit" type="xs:string" use="optional" fixed="DEGREES"/>
<xs:attribute name="unit" type="xs:string" use="optional" fixed="HERTZ"/>
<xs:attribute name="unit" type="xs:string" use="optional" fixed="SAMPLES/S"/>
These should all be updated to second, degree, volt, hertz
.
SECONDS/SAMPLE
should be second
and SAMPLES/S
should be hertz
. Samples should not be part of the unit, move to documentation.
SecondType and VoltageType appear to not be used, so maybe just remove them, but if kept, they should probably have use="optional"
as well. Alternatively, ClockDrift could use SecondType instead of an internal restriction on FloatType.
SampleRateType might benefit from a restriction of <xs:minInclusive value="0"/>
This is from an old post to iris web services list, but I just stumbled into this issue again, so just putting it here so not forgotten if/when there is a stationxml revision. http://ds.iris.edu/message-center/thread/1450/
Are there any guildlines for how the name of a unit in FDSNStationXML should be formed? Other than "do it like SEED"?
I know the unit says it is the same as SEED blockette 34, but the SEED spec says use SI but use all uppercase, which contradicts the SI convention that case matters. So for example with prefixes m is milli and M is mega, and for units g is gram while G is gauss and s is second while S is siemens. I suppose most of seismology is covered by volt, meter, second and count, but there are more and more types of data begin recorded at seismic stations and so more varieties of units we need to support. And at some point it would be nice to get away from a "formatted as FORTRAN-like equations with all alphabetic characters in upper case" way of writing units and make us of the existing standards for better portability and exchange within and outside of seismology. It seems sad that the units are still just unstructured strings that make it challenging for code to parse and correctly interpret. Following something like this might be useful: http://physics.nist.gov/cuu/Units/
Follow up question, there are seemingly two types of "units" in FDSNStationXML, one used as an element and one as an attribute. The element one, UnitType, appears to say follow the SEED convention, so you have units with names like M/S, M and V. The unit attribute on FloatType is just a string with no documentation on how it is to be used, but looking at concrete uses of it in for example SecondType, VoltageType and DistanceType, it appears that the string should be things like SECONDS, VOLTS and METERS. So we have two different ways of specifying units in FDSNStationXML with different naming conventions.
Perhaps even more confusing, the SampleRateType specifies the fixed unit string as SAMPLES/S, and which combines both unit naming conventions.
Can you clarify the unit naming scheme? I would like to be able to parse the units, but it is much harder if there is not a clear mapping from unit to/from strings.
Is this something that might be unified/simplified in a future version?
Also, should Delay and Correction in DecimationType be SecondType? Amplitude in ResponseListElement and Numerator and Denominator in CoefficientsType be FloatNoUnitType as the units are in the enclosing element? NumeratorCoefficient be FloatNoUnitType (or FloatType) to be like CoefficientsType?