RIPE-NCC / bgpdump

Utility and C Library for parsing MRT files
Other
67 stars 10 forks source link

BGP4MP_ET parsing errors #9

Open maltalex opened 3 years ago

maltalex commented 3 years ago

Hi bgpdump maintainers,

I ran into some weird issues while parsing a RouteViews MRT file. The file in question is http://routeviews.org/route-views3/bgpdata/2021.04/UPDATES/updates.20210416.1515.bz2.

I'm using the latest bgpdump (commit bd890168028a7570af01ca02e6073aa0da6885b2) with the following switches: -v, -m, -p. It looks as if the file contains prefixes that don't make any sense. Here are a few examples:

BGP4MP_ET|7680|1618586116.561183|A|104.218.61.166|213262|250.0.0.0/208|213262 38008 6939 32098 13999 262916|IGP|104.218.61.166|0|0|38008:103 64600:103 64601:38008|AG|262916 187.245.92.32|
BGP4MP_ET|13162|1618586139.260903|A|104.218.61.166|213262|0.0.0.0/238|213262 38008 6939 64049 55836 17488|IGP|104.218.61.166|0|0|38008:103 64600:103 64601:38008 65521:20|NAG||
BGP4MP_ET|21837|1618586171.006855|W|104.218.61.166|213262|160.0.0.0/160
BGP4MP_ET|258040|1618586913.681759|A|104.218.61.166|213262|80.0.0.0/127|213262 38008 6939 7738 8167 28292|IGP|104.218.61.166|0|0|38008:103 64600:103 64601:38008 65521:20|NAG||

...and many more Note the corrupt NLRI - 250.0.0.0/208, 0.0.0.0/238, 160.0.0.0/160, 80.0.0.0/127

Is this a bgpdump bug? A corrupt file? Something else?

And just to make sure we're talking about the same file, here's its hash:

> sha256sum updates.20210416.1515.bz2                                                                                                                                       
5c693ec797d32efb1189bd522ce65fca419d914364c167c1158271f55a12ed83  updates.20210416.1515.bz2

> sha512sum updates.20210416.1515.bz2                                                                                                                                       
3624be266dbfe40947dc4754ec82fcf5f8a632bb90af4dcf704f7fcd961380d4baa559e0dedfb01ced70619b1a8d8220ba1e07b0c62c32b7b364cd9489e6c819  updates.20210416.1515.bz2
spakka commented 3 years ago

The issue is that these update messages were sent by an AddPath-enabled peer. The update messages are being wrapped in the wrong type code - their NLRI format changes due to the add-path capability, but they are not being marked as such by the MRT type codes. This throws off the NLRI parser code in bgpdump.

Looks like FRR behaviour (which is hard-coded to enable add-path RX, but doesn't support writing update files in the appropriate format).

I'm looking into what can be done about this.

maltalex commented 3 years ago

@spakka Thanks for looking into this.

dteach-rv commented 3 years ago

Hi,

RouteViews uses FRR as their BGP daemon. Unfortunately, the FRR BGP daemon supports BGP ADD-PATH (RFC 7911) but does not support add-path encodings in MRT (RFC 8050). Additionally, FRR does not provide a way to disable BGP add-path RX. So any peer that decides to negotiate add-path TX with us will start sending add-path encoded routes. I've submitted a PR to the project to address this (https://github.com/FRRouting/frr/pull/8066), which has been accepted and merged. However, this patch won't be available in an FRR release until 7.7 which should be available this summer.

I'll start an Issue on the FRR github project for disabling add-path RX.

maltalex commented 3 years ago

@dteach-rv

  1. Does that mean that in essence all (recent?) RouteViews files are malformed? If so, is there a way to still parse them? Even if the bug is fixed tomorrow, there will sill be countless un-parsable files in the RouteViews archive.

  2. Do you have contacts within RouteViews to deploy a fix if and when FRR releases it?

maltalex commented 2 years ago

Do the RIPE RIS daemons support AddPath? Anecdotally, I haven't seen any mrt files that have paths with path identifiers.

digizeph commented 2 years ago

@dteach-rv

  1. Does that mean that in essence all (recent?) RouteViews files are malformed? If so, is there a way to still parse them? Even if the bug is fixed tomorrow, there will sill be countless un-parsable files in the RouteViews archive.
  2. Do you have contacts within RouteViews to deploy a fix if and when FRR releases it?

I am sure the issue is still there for existing MRT files. I am investigating this issue for my Rust-based parser and hopefully can come up with a workaround soon (potentially hacky too, since it's not conforming to standards). You can track my progress here if you're interested: https://github.com/bgpkit/bgpkit-parser/issues/31

Do the RIPE RIS daemons support AddPath? Anecdotally, I haven't seen any mrt files that have paths with path identifiers.

I have not seen any of such MRT files either. It would be great to have some example MRT files so that we could test parsers against some real-world data.

@spakka I am wondering if you guys have figured out some solutions on this yet? Thanks in advance!

maltalex commented 2 years ago

I have not seen any of such MRT files either. It would be great to have some example MRT files so that we could test parsers against some real-world data.

I guess you mean valid files since we obviously have the malformed RouteViews files as an example.

Given an MRT record is there a way to make a good educated guess as to whether it contains an malformed AddPath record, then try to parse it as such? The invalid NLRI is a dead giveaway, but could it be valid but incorrect?

spakka commented 2 years ago

@spakka I am wondering if you guys have figured out some solutions on this yet? Thanks in advance!

So far what I'd seen is:

Given the inconsistencies seen here between the live data available, it does look like it may be worthwhile to implement more intelligent parsing in BGPdump, rather than just assuming the type code is correct.

It might be possible to guess this by doing maths using the total packet length and determine whether the NLRI length computes correctly in the add-path and in the non-add-path case, to determine the correct type.

The invalid NLRI is a dead giveaway, but could it be valid but incorrect?

Yes it can be valid but incorrect :)

The problem is that the add-path ID is placed before the existing NLRI; at that place, the parser would normally expect to find the prefix-length - which then defines how many bytes it should consume for the rest of the packet.

In IPv4, one can differentiate because the possible lengths of the NLRI are between 2 and 5 octets without add-path, and between 6 and 9 octets with add-path.

But one can construct an IPv6 prefix that could be encoded either way, because there is an overlap in the possible lengths of the NLRI - between 1 and 17 octets without add-path, and between 5 and 21 octets with add-path.

e.g. in the following IPv6 case:

Wire NLRI: 0x 24 00 00 00 08 01

--------------------------------------------------------
| Add-path ID    | Prefix Length | Prefix              |
|  (32 bits)     |    (8 bits)   | (variable)          |
|------------------------------------------------------|
| 0x 24 00 00 00 |    0x 08      | 0x 01               |
|                |    0x 24      | 0x 00 00 00 08 01   |
--------------------------------------------------------

This could be either: 0100::/8 (with path-ID 0x24000000) or 0000:0008:0100::/36 (without add-path)

One needs to know the original encoding intention of the BGP speaker (signalled using the MRT type codes) to understand which one is correct

digizeph commented 2 years ago

Thanks for the detailed reply! Your discussion on IPv4 and IPv6 is very enlightening!

On the topic of detecting add-path, I am now convinced that detecting IPv4 can be implemented for sure, and IPv6 remains tricky to handle just as you documented.

I am curious whether BGP Open messages expresses the add-path capability or not. I read from https://github.com/CAIDA/libparsebgp/issues/22 that at least in BMP the add-path capability is advertised in the Open message. If the same is also true for regular updates MRT, then I think it is theoretically possible to go back in time far enough to find the peer's corresponding Open message and make assumptions about the follow-up messages accordingly.

spakka commented 2 years ago

I am curious whether BGP Open messages expresses the add-path capability or not. I read from CAIDA/libparsebgp#22 that at least in BMP the add-path capability is advertised in the Open message. If the same is also true for regular updates MRT, then I think it is theoretically possible to go back in time far enough to find the peer's corresponding Open message and make assumptions about the follow-up messages accordingly.

You can try go back to look for the OPEN message as it contains the capabilities.

However, note that add-path is negotiated in both directions. For your peer to send you add-path, they must advertise capability add-path-tx and you must advertise capability add-path-rx.

It depends if the BGP implementation that produced the MRT file is either recording both the sent and the received OPEN message, or otherwise if the capabilities are known somehow else.

For example, with the corrupted files that started this discussion, we know that FRR was advertising add-path-rx (because it is hard-coded to do so). Therefore if the peer advertised capability add-path-tx, then you know the peers messages are add-path-encoded. This may be true just now, but @dteach-rv also submitted an Issue to allow add-path-rx to be disabled in FRR - so this assumption may not always be true.

RFC6396 (MRT format) allows for representing of the outgoing messages from the route collectors, using type BGP4MP_MESSAGE_AS4_LOCAL instead of BGP4MP_MESSAGE_AS4. I implemented support for parsing this in bgpdump but I am not aware of RouteViews or RIPE RIS implementing the collection/storage of the other OPEN message in the MRT file. I don't see an option to support it in FRR or Bird just now.

Also note that the add-path negotiation is per-session and also per-AFI/SAFI so one could make a multiproto session over IPv6 where the IPv4 updates are add-path and the IPv6 are not - and it may change the next time the session goes up or down. So you would need to build a session uptime database to know what capabilities were in use at any time.

(RIPE RIS's internal implementation used to do exactly this, for tracking AS2/AS4 capabilities on a per-session basis - but I don't know if it still works like that!)

dteach-rv commented 2 years ago

Hi there,

I'm pretty confident that the OPEN messages are not captured/dumped by the bgp_dump hook. The add-path-rx disable knob was introduced in FRR 8.1 IIRC. Either way, it's not a version of FRR that we are running quite yet. But it's on the backlog. As a rule, we only accept AFI/SAFI sessions that match the underlying transit. So no v6 AFI/SAFI over a v4 session, etc.

We've discussed a fixup script that would just remove the add-path encoded updates from the MRT updates files, but no progress has been made there. For now, we have an audit script that runs against the collectors to detect if add-path-tx has been negotiated by the peer. At that point, manual intervention is taken to shut down the session and alert the peer. It's a moving target, but with 900+ BGP peers, it's the best we can do until we upgrade FRR on the collectors :).

digizeph commented 2 years ago

In IPv4, one can differentiate because the possible lengths of the NLRI are between 2 and 5 octets without add-path, and between 6 and 9 octets with add-path.

This might be difficult to implement too. The NLRI block only has the length for all prefixes in the block, and for each block, one will have to read to know how many bytes it contains. As a result, the parser actually can't know how many bytes per-prefix entry inside an NLRI block has.

I have a workaround (see this PR) that could resolve this issue (mostly).

The parsing results with this patch is included here (the MRT file in question is also included): https://drive.google.com/drive/folders/1LcZ-ywazKuGiwbH_IC0WiohnVuWUN_l3

In summary, with this patch, I don't see any /0 prefixes and no error was encountered.

➜  bgpkit-parser git:(feature_bytes_in_error) ✗ gzcat test-results.txt.gz|grep \/0
➜  bgpkit-parser git:(feature_bytes_in_error) ✗ gzcat test-results.txt.gz|wc -l
 3679857
➜  bgpkit-parser git:(feature_bytes_in_error) ✗ gzcat test-results.txt.gz|head
A|1592418600.918129|2001:7f8:1::a504:2541:1|42541|2620:ad:8081::/48|42541 3356 30714 30714|IGP||0|0|3356:3 3356:22 3356:100 3356:123 3356:575 3356:2026 42541:82 lg:42541:1:1|AG|30714|208.67.128.5
A|1592418600.918129|2001:7f8:1::a504:2541:1|42541|2620:ad:8081::/48|42541 3356 30714 30714|IGP||0|0|3356:3 3356:22 3356:100 3356:123 3356:575 3356:2026 42541:82 lg:42541:1:1|AG|30714|208.67.128.5
A|1592418600.918366|2001:7f8:1::a504:2541:1|42541|2620:ad:8080::/48|42541 3356 6453 52127|IGP||0|0|3356:2 3356:86 3356:501 3356:601 3356:666 3356:2065 3356:10725 6453:2000 6453:2200 6453:2201 42541:82 lg:42541:1:1|AG|52127|195.219.198.142
A|1592418600.918366|2001:7f8:1::a504:2541:1|42541|2620:ad:8080::/48|42541 3356 6453 52127|IGP||0|0|3356:2 3356:86 3356:501 3356:601 3356:666 3356:2065 3356:10725 6453:2000 6453:2200 6453:2201 42541:82 lg:42541:1:1|AG|52127|195.219.198.142
A|1592418600.918469|2001:7f8:1::a504:2541:1|42541|2620:12c:1::/48|42541 3356 3257 19966|IGP||0|0|3257:3257 3356:2 3356:22 3356:86 3356:510 3356:601 3356:666 3356:2081 42541:82 lg:42541:1:1|NAG||
A|1592418600.918469|2001:7f8:1::a504:2541:1|42541|2620:12c:1::/48|42541 3356 3257 19966|IGP||0|0|3257:3257 3356:2 3356:22 3356:86 3356:510 3356:601 3356:666 3356:2081 42541:82 lg:42541:1:1|NAG||
A|1592418600.918469|2001:7f8:1::a504:2541:1|42541|2620:12c:2::/48|42541 3356 3257 19966|IGP||0|0|3257:3257 3356:2 3356:22 3356:86 3356:510 3356:601 3356:666 3356:2081 42541:82 lg:42541:1:1|NAG||
A|1592418600.918469|2001:7f8:1::a504:2541:1|42541|2620:12c:2::/48|42541 3356 3257 19966|IGP||0|0|3257:3257 3356:2 3356:22 3356:86 3356:510 3356:601 3356:666 3356:2081 42541:82 lg:42541:1:1|NAG||
A|1592418600.918469|2001:7f8:1::a504:2541:1|42541|2620:12c:3::/48|42541 3356 3257 19966|IGP||0|0|3257:3257 3356:2 3356:22 3356:86 3356:510 3356:601 3356:666 3356:2081 42541:82 lg:42541:1:1|NAG||
A|1592418600.918469|2001:7f8:1::a504:2541:1|42541|2620:12c:3::/48|42541 3356 3257 19966|IGP||0|0|3257:3257 3356:2 3356:22 3356:86 3356:510 3356:601 3356:666 3356:2081 42541:82 lg:42541:1:1|NAG||
➜  bgpkit-parser git:(feature_bytes_in_error) ✗