SeisComP3 / seiscomp3

SeisComP is a seismological software for data acquisition, processing, distribution and interactive analysis.
Other
111 stars 88 forks source link

fdsnxml2inv: Fails to generate correct datalogger response #91

Closed vlauciani closed 6 years ago

vlauciani commented 7 years ago

Hi all

Running the commands below, we receive an error:

sysop@eida:~/tmp$
sysop@eida:~/tmp$
sysop@eida:~/tmp$ wget -O ~/tmp/IV_AGST.stationxml "http://webservices.ingv.it/fdsnws/station/1/query?network=IV&station=AGST&cha=SHZ&level=response&format=xml&nodata=404&formatted=true"
--2017-01-16 14:53:37--  http://webservices.ingv.it/fdsnws/station/1/query?network=IV&station=AGST&cha=SHZ&level=response&format=xml&nodata=404&formatted=true
Resolving webservices.ingv.it (webservices.ingv.it)... 93.63.207.206
Connecting to webservices.ingv.it (webservices.ingv.it)|93.63.207.206|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/xml]
Saving to: `/home/sysop/tmp/IV_AGST.stationxml'

   [ <=>                                                                                                               ] 6,393       --.-K/s   in 0s

2017-01-16 14:53:37 (327 MB/s) - `/home/sysop/tmp/IV_AGST.stationxml' saved [6393]

sysop@eida:~/tmp$
sysop@eida:~/tmp$
sysop@eida:~/tmp$
sysop@eida:~/tmp$ seiscomp exec fdsnxml2inv --formatted IV_AGST.stationxml > IV_AGST.inventoryxml
No inventory read from inventory db
Create empty one
Processing IV_AGST.stationxml
- parsing StationXML
- converting into SeisComP-XML
W  Datalogger response not found
Finished processing
Writing inventory to -
sysop@eida:~/tmp$
sysop@eida:~/tmp$
sysop@eida:~/tmp$
sysop@eida:~/tmp$ inv2dlsv IV_AGST.inventoryxml IV_AGST.dless
ERROR: unsupported operand type(s) for *=: 'float' and 'NoneType'
sysop@eida:~/tmp$
sysop@eida:~/tmp$
sysop@eida:~/tmp$

We think that the “fdsnxml2inv” command return a Warning because <Stage number=“4”> is “dummy” digitiser defined as <PzTransferFunctionType> in our internal database and in corresponding dataless (AGST.dless.zip) but it should be legittimate. What do you think about?

Thank you, Valentino

vlauciani commented 7 years ago

no one can help me?

gempa-jabe commented 7 years ago

I don't know exactly where this error is coming from. It is caused by fseed.py when creating the dataless. The easiest is to disable the try-except block in inv2dlsv and inspect the resulting backtrace. There is probably an attribute unset in your inventory XML. More verbose error messages from fseed.py would be helpful. Maybe you can help in that regard. fseed.py has been written by @andres-h in case more detailed questions arise.

andres-h commented 7 years ago

Your stationxml is missing digital (decimation) stage.

Instead of

       <Stage number="4">
         <PolesZeros>
           <InputUnits>
             <Name>V</Name>
           </InputUnits>
           <OutputUnits>
             <Name>COUNTS</Name>
           </OutputUnits>
           <PzTransferFunctionType>LAPLACE 

(RADIANS/SECOND)

1
           <NormalizationFrequency>0.2</NormalizationFrequency>
         </PolesZeros>
         <StageGain>
           <Value>409.6</Value>
           <Frequency>0.2</Frequency>
         </StageGain>
       </Stage>

you need something like:

       <Stage number="4">
         <Coefficients>
           <InputUnits>
             <Name>V</Name>
           </InputUnits>
           <OutputUnits>
             <Name>COUNTS</Name>
           </OutputUnits>
           <CfTransferFunctionType>DIGITAL</CfTransferFunctionType>
         </Coefficients>
         <Decimation>
           <InputSampleRate>512000</InputSampleRate>
           <Factor>640</Factor>
           <Offset>0</Offset>
           <Delay>0</Delay>
           <Correction>0</Correction>
         </Decimation>
         <StageGain>
           <Value>409.6</Value>
           <Frequency>0.2</Frequency>
         </StageGain>
       </Stage>

Isn't evalresp and none of the other tools complaining about your stationxml??

petrrr commented 7 years ago

@andres-h: We have been looking into the details of this issue and are convinced the situation is not as simple as written above. In particularly, we think that fdsnxml2inv uses assumption which are not fully appropriate to cover the full spectrum of response description which are valid in the SEED world.

Of cause, it is difficult to argue about the semantic validity of a Station XML doc as long as there is no self-contained specification of the format. (Note that there is no issue with formal validity, as all these XML docs validate against the scheme). Implicitly, however, stationXML is defined through analogy by the SEED Reference Manual [1]. To our understanding the correspondence here is the following:

SEED Blockette XML element in <Stage> XML element elsewhere Note
[53] Response Poles & Zeros Blockette <PolesZeros> type: ResponseStageType
[54] Response Coefficients Blockette <Coefficients> type: ResponseStageType
[55] Response List Blockette <ResponseList> type: ResponseStageType
[56] Generic Response Blockette no equivalent
[61] FIR Response Blockette <FIR> type: ResponseStageType
[62] Response Polynomial Blockette <Polynomial> <InstrumentPolynomial> in <Response> stage 0 type ResponseStageType
[57] Decimation Blockette <Decimation> minOccurs="0"
[58] Channel Sensitivity/Gain Blockette <StageGain> <InstrumentSensitivity> in <Response> stage 0
[59] Channel Comment Blockette <Comment> in <Channel>
[60] Response Reference Blockette dictionary references are resolved

This is also how we translate our SEED modeled metadata (dataless) into StationXML, which seems largely consistent with how the FDSN StationXML-SEED Converter [2] operates.

[1] SEED Reference Manual, ver. 2.4, https://www.fdsn.org/seed_manual/SEEDManual_V2.4.pdf [2] FDSN StationXML-SEED Converter, https://seiscode.iris.washington.edu/projects/stationxml-converter

Differently from the SC3 Inventory XML, station XML has no explicit concept of sensor vs. datalogger. There is no real difference whether some Response Blockette ([53], [54], [55], [56], [62] ) describes an analog or a digital stage, except maybe for the input/output units. (Analog filters usually emit volts, and digital filters usually emit counts., [1] pp. 72,74,75,76,83 ) There are recommendations on which representation should be used for which kind of filter. But note for example that for a digital IIR filter a blockette 53 representation is recommended.

I agree that in modern acquisition chains the digitalizer would usually apply oversampling, a digital FIR filter and consequently a decimation. The SEED manual [1], p. 71 states: Digital filters usually have a Decimation Blockette [57] ..., though for me this does not imply that the Decimation Blockette [57] is mandatory and therefore can be omitted if no decimation occurs.

In our case the digital stage (Stage number="4") is represented by a blockette [53], with basically no further filtering specified, i.e an idealized ("perfect") pass-through device. This is intended, because it represents analogical legacy equipment where all filtering occurs already in the analog stages before digitization, no oversampling, no digital filters, no decimation. Therefore the choice of the encoding blockette ([53] vs. [54] vs. [61] vs. [62]) is irrelevant. This blockette 53 translates correctly and consistently to an <PolesZeros> XML element representation as explained above.

We believe this representation perfectly valid. The corresponding SEED dataless, from which we actualll derived the station XML under discussion, worked correctly in the past. This includes SC3, evalresp. and other tools. The StationXML can also be converted correctly back into a SEED dataless.

That is why we believe that the problem is actually related to how fdsnxml2inv interprets stationXML to detect a digitizer (a concept which does not exist in the SEED/StationXML world) making assumptions which are not documented elsewhere.

Rather than requiring the presence of a <Coefficients> element (i.e. blockette 54) and maybe an optional <Decimation> element (i.e. blockette 57), the "digitizer stages" could be detected on the base of the input/output units. In fact, that is what is explicitly mentioned in the SEED manual (see above).

But maybe even the strict requirement of a digitizer stage might turn out problematic in some very special cases, where no standard equipment is used (just think of recovered paper recordings, GPS, etc.). In that case, probably the scheme would need to be relaxed and inv2dlsv changed accordingly.

gempa-jabe commented 7 years ago

@petrrr I agree with your observations and conclusions. Currently the dataless conversion is under major revision. It is true that both tools (dlsv2inv, fdsnxml2inv) make some assumptions about units to sort channels into either Stream or AuxStream objects. Part of that work is also the conversion back to dataless and fdsnxml which has to be changed. inv2dlsv is currently very picky about the information in the input and furthermore the error messages are not helpful at all. We have to change that. Rules, how things are converted will be part of the documentation. This is indeed necessary to be able to start discussions since not everyone is able to read source code.

FYI: I pushed a branch that tries to improve dlsv2inv. You are welcome to jump in into the discussion and to help testing.

petrrr commented 7 years ago

@gempa-jabe I think in our concrete case the problem is actually fdsnxml2inv. It does not create a valid SC3 Inventory because it cannot detect the digitize stage which is mandatory in the current specification of the Inventory XML. A consequence of this is that inv2dlsv errors out, because it cannot find the digitizer element. Of cause the error message of inv2dlsv non helpful neither.

Apart from fixing the generation of the inventory, I'd strongly encourage that the result is validated and in case it fails this should be made explicit. Maybe no output should be generated at all rather then generating an invalid file (at least if it is not "forced" by the user).

Moreover, inv2dlsv should validate the inventory XML before processing it. This should avoid incomprehensible messages like inv2dlsv - unsupported operand type(s) for *=: 'float' and 'NoneType'

petrrr commented 7 years ago

I think I cannot edit the title of this ticket but I would propose to change it to something like:

fdsnxml2inv: Fails to generate a digitizer element for valid acquisition chains

gempa-jabe commented 7 years ago

I think in our concrete case the problem is actually fdsnxml2inv . It does not create a valid SC3 Inventory because it cannot detect the digitize stage which is mandatory in the current specification of the Inventory XML.

That is not true. The digitizer is optional. Having a stream in inventory XML without a datalogger is perfectly valid. We use that often when we don't have responses. So a schema validation is done already at parser stage. You use the word valid with respect to the usage in other modules, like inv2dlsv but that are two different issues. And how would you validate what you have created? If the output is invalid, you wouldn't have created it. Again, not having a datalogger and/or a sensor is valid.

As I told you, the branch fixes issues like that for dlsv2inv but the same approach is used in fdsnxml2inv. I understood that your issue is with fdsnxml2inv. Once it is fixed and tested in dlsv2inv it will be ported to fdsnxml2inv.

You could e.g. try the branch with your posted dataless volumen and check if that improves the situation for you.

petrrr commented 7 years ago

@gempa-jabe: Thanks for your comment. I may have been misled by the idea that the digitizer was mandatory was causing the issue. However, I am pretty sure that I tried to validate the output of fdsnxml2inv. But I will repeat this experiment and will report back.

gempa-jabe commented 7 years ago

Thank you. In the meantime I have tested your dataless SEED volume that you have posted above with the new branch. It converted to SC3 and back to dataless with inv2dlsv without errors. If that is resolved then I will apply the same approach to fdsnxml2inv. I have checked the code and it is flawed in that regard. It needs some rework.

petrrr commented 7 years ago

@gempa-jabe: Note that converting dataless SEED (there and back) did not cause any issues in the past, but I think we never checked if they were "epoch-complete". On the contrary, for stationxml -> inv -> dataless this the process fails completely. Moreover, this cause Arclink to fail if dataless SEED of full SEED is requested.

gempa-jabe commented 7 years ago

@petrrr, I pushed a change (b76bcc646a227f4c0bbb2849156ef394fa8c07da) for fdsnxml2inv into the mentioned branch. I tested with your FDSN XML and it worked with the SEED conversion. If possible, give it a try and report back. You can also just apply the referenced patch to your source tree. Thank you.

petrrr commented 7 years ago

@gempa-jabe, I just talked to @vlauciani and unfortunately we have some difficulties to test this immediately. Basically we usually rely on the binary releases and have no infrastructure ready for source builds. Would it be possible (and relatively easy) for you to provide us the binary build? From what I understand we just would need to replace the binaries for fdsnxml2inv, and optionally dlsv2inv.

We are running on Debian 7. Currently, we are still on 2016.161, but we plan to update to 2016.333 in the next days.

gempa-jabe commented 7 years ago

Generally we don't provide binary builds for development versions. Compiling is easy and just a matter of installing a few additional packages and run cmake. You could also use a VM for that. In that particular case I will provide a binary for fdsnxml2inv. But I won't do that for all follow up test versions. Find your binary in https://data.gempa.de/temp/petrrr/fdsnxml2inv. It is compiled for Debian 7 / 64.

vlauciani commented 7 years ago

Hi @gempa-jabe Thank you for the binary, but running the new one we receive this error:

sysop@eida:~/seiscomp3$ seiscomp exec fdsnxml2inv --formatted input.stationxml > output.xml
fdsnxml2inv: error while loading shared libraries: libmseed.so.2.6.2: cannot open shared object file: No such file or directory
sysop@eida:~/seiscomp3$

the available library into seiscomp3 directories are:

sysop@eida:~/seiscomp3$ find . -iname libmseed*
./lib/libmseed.so.2.17
./lib/libmseed.so
./include/libmseed.h
sysop@eida:~/seiscomp3$

Thank you for your support, Valentino

gempa-jabe commented 7 years ago

Where did you get this file from? The 2016.161 build comes with file libmseed.so.2.6.2.

vlauciani commented 7 years ago

Hi @gempa-jabe Sorry, but I thought that this patch should be installed on the last sc3 version so I've updated the SC3 to Jakarta 2016.333 before to applying it.

gempa-jabe commented 7 years ago

Because you said that you were still running 2016.161 and will upgrade in the next days I created a binary for the release you were running at that time. I will build again for 2016.333 tomorrow. You can also extract 2016.161 into a temp folder and run fdsnxml2inv from there.

gempa-jabe commented 7 years ago

An updated version has been uploaded.

petrrr commented 7 years ago

@gempa-jabe: I am just trying to assess if the fix works for us and I have to observations to report back.

1. AGST

Looking through the SC3 inventory for the channel-epoch which triggered this issue (https://gist.github.com/petrrr/813b1b4cbf8d42bc9539d326df79cfb4). The result looks reasonable. All 4 stages are present and encoded as <responsePAZ> elements (which probably correspond to blockette 53/<PolesZeros>).

The only doubt is whether the last Stage 4, the one which converts from V to COUNTS should not be encoded in a <digitalFilterChain> element. Currently, when I convert the SC3inv it dataless, I obtain an additional 5th stage (encoded in with a blockette 54), which basically does nothing appart from changing the units. I'd assume there is no harm to the response, but makes the dataless uselessly verbose. I guess moving it to from <analogueFilterChain> to <digitalFilterChain> should to the trick.

#               
#               +               +-------------------------------------------+                 +
#               +               |   Response (Coefficients),  AGST ch SHZ   |                 +
#               +               +-------------------------------------------+                 +
#               
B054F03     Transfer function type:                D
B054F04     Stage sequence number:                 5
B054F05     Response in units lookup:              V - Volts
B054F06     Response out units lookup:             COUNTS - Digital Counts
B054F07     Number of numerators:                  0
B054F10     Number of denominators:                0
#               
#               +                      +------------------------------+                       +
#               +                      |   Decimation,  AGST ch SHZ   |                       +
#               +                      +------------------------------+                       +
#               
B057F03     Stage sequence number:                 5
B057F04     Input sample rate:                     5.000000E+01
B057F05     Decimation factor:                     1
B057F06     Decimation offset:                     0
B057F07     Estimated delay (seconds):             0.000000E+00
B057F08     Correction applied (seconds):          0.000000E+00
#               
#               +                  +---------------------------------------+                  +
#               +                  |       Channel Gain,  AGST ch SHZ      |                  +
#               +                  +---------------------------------------+                  +
#               
B058F03     Stage sequence number:                 5
B058F04     Gain:                                  1.000000E+00
B058F05     Frequency of gain:                     0.000000E+00 HZ
B058F06     Number of calibrations:                0
#               

2. Other stations now cause now different errors:

sysop@eida:~$ inv2dlsv fdsnxml2inv_new.xml > /dev/null
Error (IV,BOTM,,HNZ): invalid filter type: ResponseFIR#20170307095922.979917.105 (ResponseFIR#20170307095922.979917.105)
Error (IV,BOTM,,HNE): invalid filter type: ResponseFIR#20170307095922.979917.105 (ResponseFIR#20170307095922.979917.105)
Error (IV,BOTM,,HNN): invalid filter type: ResponseFIR#20170307095922.979917.105 (ResponseFIR#20170307095922.979917.105)
[...]
Error (IV,BADI,,EHN): invalid filter type: ResponseFIR#20170307095922.975072.8 (ResponseFIR#20170307095922.975072.8)
Error (IV,BADI,,EHE): invalid filter type: ResponseFIR#20170307095922.975072.8 (ResponseFIR#20170307095922.975072.8)
Error (IV,BADI,,EHZ): invalid filter type: ResponseFIR#20170307095922.975072.8 (ResponseFIR#20170307095922.975072.8)

Looking through the files the result looks reasonable, but apparently this might be a regression and caused by the addition of an <analogueFilterChain> element to <datalogger>:

    <datalogger publicID="Datalogger#20170307095922.974936.7" name="IV.BADI.EHE.2004.212.10">
      <gain>1</gain>
      <maxClockDrift>0</maxClockDrift>
      <calibration serialNumber="xxxx" channel="2">
        <start>2004-07-30T10:00:00.0000Z</start>
        <end>2009-09-24T12:40:00.0000Z</end>
        <gain>1</gain>
      </calibration>
      <decimation sampleRateNumerator="50" sampleRateDenominator="1">
        <analogueFilterChain>ResponseFIR#20170307095922.975072.8</analogueFilterChain>
        <digitalFilterChain>ResponseFIR#20170307095922.975228.9 ResponseFIR#20170307095922.975296.10 ResponseFIR#20170307095922.975296.10 ResponseFIR#20170307095922.975296.10 ResponseFIR#20170307095922.975296.10 ResponseFIR#20170307095922.975296.10 ResponseFIR#20170307095922.975296.10 ResponseFIR#20170307095922.975573.16 ResponseFIR#20170307095922.975612.17</digitalFilterChain>
      </decimation>
    </datalogger>

I just realize that we actually updated only fdsnxml2inv. Might the second observation be caused by an inconsistency between fdsnxml2inv and inv2dlsv?

andres-h commented 7 years ago

IIRC, "invalid filter type" means analogue filter is used in digital filter chain or vice versa.

You cannot just move things from analogueFilterChain digitalFilterChain.

A datalogger looks like this:

analogueFilterChain -> digitalization -> digitalFilterChain

gempa-jabe commented 7 years ago

I would have to check why the FIR filter has been added to the analogue filter chain but I suspect it is due to the units: V -> V. Is that analogue or digital? I would argue for analogue. Maybe that is a dummy filter defining the pre-amplifier gain. @petrrr, can you point me to the dataless of the station causing the conversion issue?

You cannot just move things from analogueFilterChain digitalFilterChain.

If you take the current state of inv2dslv into account that is true, but that can be changed. First I would like to create proper SC3 from dataless/FDSNXML. Next step is to spot issues with inv2dlsv. Can we define concrete rules when to add a stage to digitial and when to analog? I even have had seen a dataless with analog, digital and analog stages (in that order). Maybe the dataless is wrong, but the conversion must be able to handle that case or throw an error.

gempa-jabe commented 7 years ago

Here the links to the code that does the conversion:

andres-h commented 7 years ago

On 03/08/2017 09:34 PM, Jan Becker wrote:

Can we define concrete rules when to add a stage to digitial and when to analog?

Analog is V->V (continuous filter). Digital is COUNTS->COUNTS (discrete filter).

I even have had seen a dataless with analog, digital and analog stages (in that order). Maybe the dataless is wrong, but the conversion must be able to handle that case or throw an error.

Theoretically one can add a DAC to convert counts back to volts, but that does not sound like a realistic use case. The last stage must be digital anyway, if the data is to be recorded digitally.

I've seen dataless that converts from m/s directly to counts, though. That is not supported by the current inventory, except when adding a dummy volts stage.

gempa-jabe commented 7 years ago

Where to put V->COUNTS? Isn't that an analogue stage, too?

gempa-jabe commented 7 years ago

Example

andres-h commented 7 years ago

On 03/08/2017 10:37 PM, Jan Becker wrote:

Where to put V->COUNTS? Isn't that an analogue stage, too?

V->COUNTS is just digitalization: COUNTS(t) = V(t) * datalogger.gain

analogueFilterChain -> digitalization -> digitalFilterChain

andres-h commented 7 years ago

On 03/08/2017 10:53 PM, Jan Becker wrote:

Example https://smp.gempa.de/data/NRL/dataloggers/reftek/RESP.XX.NR129..HHZ.72A06.32.16.100

That example (stage 3) includes an analogue filter V->V. Usually this would be represented by 2 separate stages.

Moreover, stage 1 is nonsense from physics point of view, because you cannot convert m/s to V with flat response. It seems that the actual poles and zeros of the sensor are added to stage 3 to compensate, so the combination of stages 1..3 gives a valid seismometer response.

gempa-jabe commented 7 years ago

Since that is a file from the NRL, I think we have to deal with it as it is. So how should we do the conversion and does that fit into a general rule? The conversion back to SEED would result in a different representation, a "repaired" representation? Do you think such an approach is feasible? I just remember all the issues of the past years with converting between the formats. There seem to be too many opinions on how response stages should look like. I don't want to add another one ;)

andres-h commented 7 years ago

I thought NRL has separate responses for seismometers and dataloggers. This file seems to have seismometer and datalogger combined in a weird way. It can be converted to inventory, though:

As is:

stage 1 -> dummy sensor (V = m/s) stage 2 -> dummy analogue filter (gain=32) stage 3 -> datalogger (incl. analogue filter) stage 4..6 -> digital filters

Repaired: stage 1..3 -> full sensor (incl. gain, poles&zeros) stage 3 -> datalogger stage 4..6 -> digital filters

gempa-jabe commented 7 years ago

Exactly, that is a datalogger response. Stage 1 is a dummy and stage 2 is the pre-amplifier gain (32).

andres-h commented 7 years ago

OK, I'm not familiar with NRL files. Maybe stage 1 is a placeholder that is to be replaced by a real sensor.

andres-h commented 7 years ago

In that case the datalogger does include an actual analogue filter (in addition to preamplifier gain). The preamplifier gain might be combined with datalogger gain instead of adding a dummy filter.

gempa-jabe commented 7 years ago

OK, so you would agree to put stage 3 (V->COUNTS) into the analogue filter chain? That is how it is done currently.

andres-h commented 7 years ago

Yes.

Technically it is analogue filter (V->V) followed by digitalization (V->COUNTS).

Consequently the gain of stage 3 is the product of the gains of the analogue filter and digitizer. You can, eg., use it as the gain of digitizer and set the gain of the filter to 1.

petrrr commented 7 years ago

Though I am not very familiar with SC3 Inventory semantics (and I would not know where to look it up), I guess, the major problem here might a mismatch of the underlying concept for the data model, between SEED and SC3.

While I might agree that @andres-h's "analogueFilterChain -> digitalization -> digitalFilterChain" is a valid abstraction for modern seismological acquisitions chain, it might fit somewhat less well for legacy systems or non seismological applications. And I somehow suspect that this abstraction guided the SC3 model. Still there should be a way to convert SEED/StationXML to SC3 and back transparently.

Basically and as I mention already above, in SEED/Station XML you model the response of a system in general, with no explicit notion of sensor, modulator, digitizer, or other equipment, etc. This is done in a sequence of stages, that's it! Each stage specifies units, gain [58], and filter [53,54,55,56,61,62], decimation [57] all combined, some are optional. The only way to know if you are in analog or the digital domain are the units (as explicitly mentioned in the SEED manual), but SC3 seems not to have an equivalent representation for input and output units.

It is also important to note that filter blockettes [53, 54, 55, 56, 62] can represent both digital and analog filters, thought it is not recommended in all combinations and some of the blockettes have be used in combination. For example blockette 53 (PAZ) has type D is explicitly recommended for digital IIR filters. On the contrary, you can use blockette 54 (Coefficients, not FIR) can be used for analog filters (type A and B), though this is not recommended.

Partly due the naming, partly due to your comments, it appear to me that SC3 might have difficulties to represent all this variety/heterogeneity. Specifically: Apparently, SC3 assume that [53] would always be "analog", and [54] always FIR, i.e. "digital". (I might be wrong here!) While while FIR is probably always digital, [54] = Coefficient is not always "digital", at least not in the SEED world.

Most problematical is the stage where conversion/digitalization occurs. SEED models the digitizer (or the whole chain) as a cascade of stages, and the stage where conversion V to COUNTS occurs in combination with some response, but if this correspond to an analog or a digital filter or dummy is not explicit. You for sure will find in this stage: PAZ respones type A, B (probably analog), Coefficient responses type D (probably digital), FIR [62] (digital), and dummy responses [53,54]. But you should also expect and be able to deal with PAZ [53] type D (probably digital), Coefficient response [54] type A/B (probably analog), Polynomial Response (analog or digital or combined).

andres-h commented 7 years ago

Though I am not very familiar with SC3 Inventory semantics (and I would not know where to look it up), I guess, the major problem here might a mismatch of the underlying concept for the /data model/, between SEED and SC3.

SC3 inventory is "higher level" than SEED. It is easy and efficient to convert SC3 inventory to SEED, but not vice versa. The task of converting SEED to inventory is somewhat like converting an assembly language program to C, or TeX to LaTeX. The result of conversion is in any case sub-optimal, but in theory most SEED volumes can be converted.

When the inventory data model was designed, we expected that users create inventory XML with tools like SMP or the nettab system we use at GFZ. Unfortunately the legacy of SEED was larger than expected and continues with FDSN StationXML.

A new SC3 inventory model that is compatible with SEED and FDSN StationXML is being designed...

While I might agree that @andres-h https://github.com/andres-h's "analogueFilterChain -> digitalization -> digitalFilterChain" is a valid abstraction for modern /seismological/ acquisitions chain, it might fit somewhat less well for legacy systems or non seismological applications.

I think it's actually pretty generic if there is one digitalization step.

And I somehow suspect that this abstraction guided the SC3 model. Still there should be a way to convert SEED/StationXML to SC3 and back transparently.

When converting SEED to SC3 and back, the result must be equivalent to the original file, but I'm not sure if it must be byte-identical.

In fact, why do you want to convert SC3 back to SEED/StationXML in the first place? If SEED/StationXML is your working format, just keep your original SEED/StationXML file.

FDSNWS-station (and Arclink, but that's being phased out anyway) does not necessarily have to use the SC3 database as the source.

Basically and as I mention already above, in SEED/Station XML you model the response of a system in general, with no explicit notion of sensor, modulator, digitizer, or other equipment, etc. This is done in a sequence of stages, that's it! Each stage specifies units, gain [58], and filter [53,54,55,56,61,62], decimation [57] all combined, some are optional. The only way to know if you are in analog or the digital domain are the units (as explicitly mentioned in the SEED manual), but SC3 seems not to have an equivalent representation for input and output units.

We assume that a seismometer converts input units to volts. Input and output units can be added to each analogue filter, but they cancel out and do have an effect on the response of the complete system.

It is also important to note that filter blockettes [53, 54, 55, 56, 62] can represent both digital and analog filters, thought it is not recommended in all combinations and some of the blockettes have be used in combination. For example blockette 53 (PAZ) has type |D| is explicitly recommended for digital IIR filters. On the contrary, you can use blockette 54 (Coefficients, not FIR) can be used for analog filters (type |A| and |B|), though this is not recommended.

Partly due the naming, partly due to your comments, it appear to me that SC3 might have difficulties to represent all this variety/heterogeneity. Specifically: Apparently, SC3 assume that [53] would always be "analog", and [54] always FIR, i.e. "digital". (I might be wrong here!) While while FIR is probably always digital, [54] = Coefficient is not always "digital", at least not in the SEED world.

Most problematical is the stage where conversion/digitalization occurs. SEED models the digitizer (or the whole chain) as a cascade of stages, and the stage where conversion |V| to |COUNTS| occurs in combination with some response, but if this correspond to an analog or a digital filter or dummy is not explicit. You for sure will find in this stage: PAZ respones type A, B (probably analog), Coefficient responses type D (probably digital), FIR [62] (digital), and dummy responses [53,54]. But you should also expect and be able to deal with PAZ [53] type D (probably digital), Coefficient response [54] type A/B (probably analog), Polynomial Response (analog or digital or combined).

It is true that not all SEED responses have been implemented, but this is not a fundamental problem of the inventory schema. Some additional response types (Polynomial, FAP) have been added recently.

petrrr commented 7 years ago

@andres-h: Could you point me to some documentation for the SC3 Inventory format? I know where to find the XML scheme for the format, but this is only a formal/syntactic definition. Is there also some semantic documentation equivalent to the SEED manual, I am not aware of?

petrrr commented 7 years ago

In fact, why do you want to convert SC3 back to SEED/StationXML in the first place? If SEED/StationXML is your working format, just keep your original SEED/StationXML file.

We actually do not want to. But we use SC3 for EIDA/Arclink and Arclink provides SEED dataless, full SEED. These are then non consistent with the SEED/StationXML we produced upstream. For StationXML the situation not problematic. Though SC3 provides the related FDSN service, this is no exposed.

FDSNWS-station (and Arclink, but that's being phased out anyway) does not necessarily have to use the SC3 database as the source.

We are probably not aware of this feature and assume that we have to use the SC3 DB to operate within the EIDA federation, see e.g. http://www.seiscomp3.org/doc/jakarta/current/apps/global.html#config-fig-inventory-sync. If there is a way to use Arclink without the SC3 DB, we probably should explore and consider this.

But honestly I then lack an understanding of how the Inventory synchronization would work?

Anyway, this is probably OT and we should move discussion elsewhere.

andres-h commented 7 years ago

Could you point me to some documentation for the SC3 Inventory format? I know where to find the XML scheme for the format, but this is only a formal/syntactic definition. Is there also some semantic documentation equivalent to the SEED manual, I am not aware of?

It is documented in the tags in the master schema (generic.xml). I'll have to see if I can find it online. There was supposed to be a reference manual like QuakeML has (after all, inventory was planned to be included in QuakeML), but due to issues and redesign that has been planned for a long time, it hasn't happened yet.

petrrr commented 7 years ago

@andres-h told me (during a meeting) that you would need an example. I assume, you require an example of a station which fails with the new version of fdsnxml2inv.

One example is the station BADI (see also above):

    <datalogger publicID="Datalogger#20170307095922.976529.33" name="IV.BADI.EHZ.2004.212.10">
      <gain>1</gain>
      <maxClockDrift>0</maxClockDrift>
      <calibration serialNumber="xxxx" channel="0">
        <start>2004-07-30T10:00:00.0000Z</start>
        <end>2009-09-24T12:40:00.0000Z</end>
        <gain>1</gain>
      </calibration>
      <decimation sampleRateNumerator="50" sampleRateDenominator="1">
        <analogueFilterChain>ResponseFIR#20170307095922.975072.8</analogueFilterChain>
        <digitalFilterChain>ResponseFIR#20170307095922.975228.9 ResponseFIR#20170307095922.975296.10 ResponseFIR#20170307095922.975296.10 ResponseFIR#20170307095922.975296.10 ResponseFIR#20170307095922.975296.10 ResponseFIR#20170307095922.975296.10 ResponseFIR#20170307095922.975296.10 ResponseFIR#20170307095922.975573.16 ResponseFIR#20170307095922.975612.17</digitalFilterChain>
      </decimation>
    </datalogger>

Now a FIR filter is indeed not an analogue filter and probably therefore the parsing of this inventory fails.

Some considerations though.

  1. In this concrete case, where there is actually no filter (do nothing filter), the ideal behavior might be to insert no filter in the SC3 model at all, its a digitizer. This would avoid cluttering the inventory with useless stuff. However, detecting dummy filters would need to handle more cases: In the <Coefficients>, [54] <FIR>, [61] case there are basically no coefficients listed, in the <PolesZeros>, [53] case no poles ore zeros.

  2. <CfTransferFunctionType>DIGITAL</CfTransferFunctionType> is well represented as <responseFIR> in the inventory, but it probably should got into the <digitalFilterChain>.

  3. Are there appropriate representations of the other CFTransferTypes or are they just not supported? If supported it probably might work if they are considered analog filters?

  4. The digitizer is often combined with <PolesZeros>, [53] as well, both dummy or real. If this is of type <PzTransferFunctionType>LAPLACE (RADIANS/SECOND)</PzTransferFunctionType> then putting it into the <analogueFilterChain> should work, while <PzTransferFunctionType>DIGITAL</PzTransferFunctionType> (if supported) might better go into the digital filters.

So it might work with a slightly more complete set of rules based on:

and then decide if it is:

gempa-jabe commented 7 years ago

There is another ongoing issue on a similar topic: #116. Just to illustrate, how difficult somethings things are to be sorted out. Two people, two opinions.

You said:

In this concrete case, where there is actually no filter (do nothing filter), the ideal behavior might be to insert no filter in the SC3 model at all, its a digitizer. This would avoid cluttering the inventory with useless stuff.

@lemarchandarnaud said:

I think that the pre-amp shoulb be in the AnalogFilterChain as a PAZ without zeros and poles. Thus the datalogger.gain would always be the digitizer gain (Count/V) and nothing else.

Thanks for your detailed remarks.

petrrr commented 7 years ago

@gempa-jabe: What are we actually expected to do in order to move the resolution of this issue forward. This issue is creating serious issues with our operations.

From what I understand: The problem has actually two aspects:

  1. fdsnxml2inv fails to translate some of our FDSN station XML to SC3 inventory XML, the resulting files are incomplete. The affected station XML are fully supported (that's what I called valid earlier in the thread) in the SEED world (SEED dataless and FDSN station XML).

  2. These inventory.xml files do not behave well if are used later:

    • inv2dlsv errors out: ERROR: unsupported operand type(s) for *=: 'float' and 'NoneType'
    • No problem occurs when the files are ingested into the inventory DB. However, arclink errors out with an internal error.
gempa-jabe commented 7 years ago

Currently the branch dlsv-conversion-fix is used to fix issues related to SEED/StationXML conversions. You can contribute with testing and commenting. Note anyway that SEED is not SC3 XML. There is no 1:1 conversion possible and data is converted based on best practices which we are refining in the named branch. Work is in progress.

If you need a 1:1 representation of your SEED/StationXML files with e.g. fdsnws then it could be an option to maintain your original data and implement a station handler for fdsnws to serve your files without conversion.

andres-h commented 7 years ago

I wrote a quick Python script to do the conversion, because there is an issue with AlpArray response that needs to be fixed urgently and telling people to use the dlsv-conversion-fix branch is not feasible. The script (fdsnxml2arclink) is in my repository.

Example usage:

$ wget 'http://webservices.ingv.it/fdsnws/station/1/query?network=IV&station=AGST&cha=SHZ&level=response&format=xml&nodata=404&formatted=true' -O AGST.fdsn.xml

$ python fdsnxml2arclink.py AGST.fdsn.xml | ~/seiscomp3/bin/seiscomp exec sccnv -f -i arclink:- -o AGST.sc3.xml

$ ~/seiscomp3/bin/seiscomp exec inv2dlsv AGST.sc3.xml AGST.dlsv

$ rdseed -Rf AGST.dlsv