SeisComP / seedlink

Seedlink server to be built within SeisComP
Other
13 stars 17 forks source link

scream plugin: single Fic!=Ric error crashes everything #18

Open filefolder opened 4 days ago

filefolder commented 4 days ago

Hi all

Noticing that a single corrupted packet from a remote scream! server seems to be crashing the entire plugin, causing a gap of 60-100 seconds on every station rather than the individual culprit. Here is the error:

 Fic!=Ric
Wed Nov 20 07:16:32 2024 - seedlink: [scream0] unexpected eof
Wed Nov 20 07:16:33 2024 - seedlink: [scream0] terminated with error status 255
Wed Nov 20 07:17:33 2024 - seedlink: [scream0] starting shell

this is in various functions in gcf.c

  if (b->fic != b->ric)
    fatal (("Fic!=Ric"));

Is there a reasonable way to replace that fatal error, throw a warning, and drop that particular packet Instead of killing the whole plugin?

I might try removing the checks in the extract_## functions and moving them to the switch cases in gcf_dispatch, e.g.

          case 4:
            block.csize = 24 + block.samples;
            extract_8 (&block);
            if (b->fic != b->ric)
                return;
            break;
          break;

Would this work? Better ideas? I know this code is 20 years old now..

(n.b. I suspect this is from a packet from a sensor that is currently back-filling old data, if anyone has any experience mitigating these sort of things would also love to hear how you did it!)

Thanks as usual

gempa-jabe commented 4 days ago

As you said, this code is very old and contributed. Maybe the fic != ric issue is unrecoverable in this particular code design and therefore needs a hard reset. As you are the only one so far reporting this issue since 20 years I doubt that there is a workaround on someones table. I personally do not have access to a Guralp datalogger and cannot test this case. Just give your ideas a try and see what happens.

Maybe @andres-h has more viable input as he is much more experienced with seedlink and its plugins than anybody else.

filefolder commented 4 days ago

Thanks Jan. I will probably end up just going for it, but I only have "real" data to test it on and I certainly don't have a strong eye for such things.

filefolder commented 4 days ago

Quick update but the above hack seems to work as intended-- big news for us at least as this has been a particularly diabolical issue.

WARNING: Fic!=Ric
Wed Nov 20 14:51:08 2024 - seedlink: S1.AUAPY : HHZ time gap 5 seconds (detected)
WARNING: Fic!=Ric
Wed Nov 20 14:54:38 2024 - seedlink: S1.AUAPY : HHZ time gap 5 seconds (detected)
WARNING: Fic!=Ric

Not sure exactly how to specify the offending serial number in the warning (though clearly AUAPY in this example), but instead of gapping the entire network it just lets dispatch handle the individual gap. I will write up a PR and keep testing for the next few weeks for any 2nd order timing effects.