dragonresearch / rpki.net

Dragon Research Labs rpki.net RPKI toolkit
54 stars 30 forks source link

Corrupt Validation Data #707

Closed sraustein closed 10 years ago

sraustein commented 10 years ago

Greetings.

I'm picking up this error recently:


May 3 16:48:59.958 EAT: %BGP-4-BGP_BAD_RPKI_MESSAGE: BGP received an invalid RPKI message from rpki server 41.x.x.x/43779 Corrupt Data Message length 1698308210 too long or too short hex dump:


The only thing that changed was upgrades to RPKI.

I've also upgraded to the latest release for this platform (IOS 15.2(4)S5 for 7200) for good measure, but the issue persists.

A ROA test on the validation servers looks fine:

[root@nms2 /home/tinka/rpki]# ./test-roa-count.sh 7854 rpki.ripe.net 1922 repository.lacnic.net 294 rpki.arin.net 217 rpki.apnic.net 76 rpki.afrinic.net 32 rgnet.rpki.net 30 localcert.ripe.net 28 rpki-repository.nic.ad.jp 22 rpki-pilot.lab.dtag.de 14 ca0.rpki.net 0 rpki-testbed.apnic.net 0 repo0.rpki.net

Total 10489 [root@nms2 /home/tinka/rpki]#

Kindly advise how I can include the hex dump.

Trac ticket #695 component rpkid priority major, owner None, created by aknit44 on 2014-05-05T13:16:54Z, last modified 2014-05-06T02:00:14Z

sraustein commented 10 years ago

Kindly advise how I can include the hex dump.

Attach to ticket (follow the link) or just include the hex as text. Former is (probably) better but whatever works.

Trac comment by sra on 2014-05-05T13:18:45Z

sraustein commented 10 years ago

On Monday, May 05, 2014 03:18:45 PM Trac Ticket System wrote:

Attach to ticket (follow the link) or just include the hex as text. Former is (probably) better but whatever works.

What I meant was how do I dump the hex in the first place :-)?

Mark.

[attachment:"signature.asc"]

Trac comment by aknit44 on 2014-05-05T13:42:12Z

sraustein commented 10 years ago

signature.asc Added by email2trac Trac attachment by aknit44 on 2014-05-06T02:00:14Z

sraustein commented 10 years ago

I thought the error message you posted said it included a hex dump, which (by implication) you omitted. Maybe the error message meant "hex dump" as an imperative verb rather than as a noun?

Anyway, if you don't already have a dump, your best bet is probably tcpdump. Eg:

{{{ tcpdump -s 0 -w rpki-rtr-wtf.pcap tcp port 43779 }}}

Adjust port number if you're using something else. Do whatever it takes to trigger the bug, then ^C tcpdump. Then attach the resulting .pcap to this ticket, perhaps after compressing it with xz.

Trac comment by sra on 2014-05-05T14:05:21Z

sraustein commented 10 years ago

rpki-rtr-wtf.pcap Trac attachment by aknit44 on 2014-05-06T02:00:14Z

sraustein commented 10 years ago

On Monday, May 05, 2014 04:05:22 PM Trac Ticket System wrote:

I thought the error message you posted said it included a hex dump, which (by implication) you omitted.

No, I posted the logs exactly the way it came to the router.

No dump was included.

Maybe the error message meant "hex dump" as an imperative verb rather than as a noun?

Perhaps.

Anyway, if you don't already have a dump, your best bet is probably tcpdump. Eg:

{{{ tcpdump -s 0 -w rpki-rtr-wtf.pcap tcp port 43779 }}}

Adjust port number if you're using something else. Do whatever it takes to trigger the bug, then ^C tcpdump. Then attach the resulting .pcap to this ticket, perhaps after compressing it with xz.

Attached to the case.

There is no triggering the "bug", it's always there. I have no validation working on the routers now, so this is the issue I'm constantly seeing.

Hope you can find something in the .pcap. I let it run for 10x minutes (it's only 85KB).

Cheers,

Mark.

[attachment:"signature-1.asc"]

Trac comment by aknit44 on 2014-05-05T14:31:48Z

sraustein commented 10 years ago

signature-1.asc Added by email2trac Trac attachment by aknit44 on 2014-05-06T02:00:14Z

sraustein commented 10 years ago

Looks like an xinetd configuration problem. tcptrace shows ten zillion short connections, and if one extracts the connection content (tcptrace -e) then runs strings on the result, one sees a repeating cycle of:

{{{ /Message length 1698308210 too long or too short usage: rpki-rtr [-h] [--debug] [--log-level {debug,info,warning,error,critical}] [--log-to {syslog,stderr}] ... rpki-rtr: error: argument : invalid choice: '/var/rcynic/rpki-rtr' (choose from 'server', 'listener', 'client', 'cronjob', 'show') }}}

The usage message is coming from rpki-rtr, and is being sent to stderr (I just verified this).

stderr out not be going to the network at all, xinetd probably ought to be sending it to syslog, but since it happens to be there we can read it.

The error is saying is that you omitted the "server" argument in your xinetd command (or xinetd thinks you did -- check for off-by-one error in how xinetd parses arguments vs what you have).

Given that the server errors out, it's not surprising that the router chokes. The router might perhaps issue a more useful diagnostic, but I'm hardly in a position to point claws there.

Trac comment by sra on 2014-05-05T15:22:28Z

sraustein commented 10 years ago

On Monday, May 05, 2014 05:22:28 PM Trac Ticket System wrote:

The error is saying is that you omitted the "server" argument in your xinetd command (or xinetd thinks you did -- check for off-by-one error in how xinetd parses arguments vs what you have).

That's interesting.

This is what I have for xinetd:


service rpki-rtr {

    socket_type = stream
    protocol = tcp
    port = 43779
    wait = no
    user = rcynic
    server = /usr/local/bin/rpki-rtr
    server_args = --server /var/rcynic/rpki-rtr

}

service rpki-rtr {

    socket_type = stream
    protocol = tcp
    flags = IPv6
    port = 43779
    wait = no
    user = rcynic
    server = /usr/local/bin/rpki-rtr
    server_args = --server /var/rcynic/rpki-rtr

}


That looks pretty okay to me.

It's the same configuration I've been running from Day One, so unless something changed within xinetd, it's odd.

xinetd starts with no complaints:


May 5 21:32:09 nms xinetd[72610]: Exiting...

May 5 21:32:09 nms xinetd[72945]: xinetd Version 2.3.15 started with libwrap loadavg options compiled in.

May 5 21:32:09 nms xinetd[72945]: Started working: 2 available services


Does any of that look odd to you?

There is no other interesting logs being sent to Syslog.

Mark.

[attachment:"signature-2.asc"]

Trac comment by aknit44 on 2014-05-05T19:36:39Z

sraustein commented 10 years ago

signature-2.asc Added by email2trac Trac attachment by aknit44 on 2014-05-06T02:00:14Z

sraustein commented 10 years ago
     server_args = --server /var/rcynic/rpki-rtr

"--server" should be "server". This was a change from rtr-origin: rpki-rtr tries to follow the argparse rule that things that look like options should be optional.

Not sure why argparse reported the particular error it did: it was correct that the problem is the missing "server", but I would have expected it to say it didn't like "--server", not to say that it didn't like "/var/rcynic/rpki-rtr".

There is no other interesting logs being sent to Syslog.

Apparently xinetd's normal behavior is to bind stderr to the network connection. This seems completely wrong to me (and I'm not alone, given some of the comments I found), but I see no obvious way to fix it.

The fundamental issue here is that there's a chicken and egg problem in the relationship between argument parsing and logging: if logging starts before argument parsing, we can't use arguments to control logging; if argument parsing happens before we start logging, we can't use logging to report argument parsing errors. In both cases I'm oversimplifying slightly, and there are kludges we could try, but you get the idea.

Anyway, try "server" instead of "--server" and see if that helps.

Trac comment by sra on 2014-05-05T20:02:21Z

sraustein commented 10 years ago

On Monday, May 05, 2014 10:02:22 PM Trac Ticket System wrote:

"--server" should be "server". This was a change from rtr-origin: rpki- rtr tries to follow the argparse rule that things that look like options should be optional.

I should probably do a better job of reading update release notes :-).

Anyway, try "server" instead of "--server" and see if that helps.

Now we're cooking with gas :-).

It's all sorted now. We have a good TCP connection and validation data is now hitting the router.

Many thanks, Rob. Always a pleasure :-).

Cheers,

Mark.

[attachment:"signature-3.asc"]

Trac comment by aknit44 on 2014-05-05T20:32:35Z

sraustein commented 10 years ago

signature-3.asc Added by email2trac Trac attachment by aknit44 on 2014-05-06T02:00:14Z

sraustein commented 10 years ago

Whew!

I guess one of these days we could try doing a release to see if anybody reads the release notes. :)

Trac comment by sra on 2014-05-06T02:00:14Z

sraustein commented 10 years ago

Closed with resolution fixed