SichangHe / internet_route_verification

RPSLyzer: Parse Routing Policy Specification Language from IRR and compare BGP routes against it
MIT License
1 stars 0 forks source link

Parser stats #57

Closed SichangHe closed 10 months ago

SichangHe commented 1 year ago
These changes are then rolled back. # Update after fixing `default` in https://github.com/SichangHe/internet_route_verification/commit/3c8f8f93be617164cc69299d03f1069d11565ede ``` Summary Parsed 78951 aut_nums, 59724 as_sets, 24652 route_sets, 342 peering_sets, 202 filter_sets, 87534 as_routes. 29 skips during lexing, 765 syntax errors, 267 unknown path attributes, 23 invalid names parsing AS Sets, 22 invalid Route Set names, 1 invalid AS Route, 104 complex PeerAS. ``` Edit: no difference after treating `default` as import in 6ff97d6d01445d6f848d0facf7eda25f38bdcf9c.

(Earlier) Summary

29 skips during lexing, 545 syntax errors, 267 unknown path attributes, 23 invalid names parsing AS Sets, 22 invalid Route Set names, 1 invalid AS Route, 104 complex PeerAS.


Original

Summary
    Parsed 78951 aut_nums, 59724 as_sets, 24652 route_sets, 342 peering_sets, 202 filter_sets, 87534 as_routes.
    29 skips, 545 lexing errors, 376 parsing errors, 0 unknown errors.

0918parse.txt

SichangHe commented 1 year ago

@cunha, what extra details do we need?

cunha commented 1 year ago

I think these are good. We'll likely need the same numbers on a per AS basis.

I've also added some entries to the Git PDF repo.

SichangHe commented 1 year ago

SichangHe commented 1 year ago

New, detailed stats.

Summary
    Parsed 78951 aut_nums, 59724 as_sets, 24652 route_sets, 342 peering_sets, 202 filter_sets, 87534 as_routes.
    29 skips during lexing, 545 syntax errors, 267 unknown path attributes, 23 invalid names parsing AS Sets, 22 invalid Route Set names, 1 invalid AS Route, 104 complex PeerAS.
cunha commented 1 year ago

87534 as_routes

Interesting. Are these route and route6 objects? I would expect to have many more, as just the IPv4 routing table is about 900K prefixes these days.

SichangHe commented 1 year ago

Interesting. Are these route and route6 objects? I would expect to have many more, as just the IPv4 routing table is about 900K prefixes these days.

No. These are the routes for each AS.

cunha commented 1 year ago

Ha! I came here to answer myself, but now I'm confused. It seems like as_routes are indeed route and route6 objects, parsed here.

SichangHe commented 1 year ago

Ha! I came here to answer myself, but now I'm confused. It seems like as_routes are indeed route and route6 objects, parsed here.

They come from route and route6, but are grouped under the ASes they belong to. Sorry for the confusion.

SichangHe commented 1 year ago

@cunha, are these good enough or do we need more stats for the parsing process?

cunha commented 1 year ago

I think these classes are fine; can't think of anything else right now.

SichangHe commented 10 months ago
Update: Removed non-major IRRs. ```sh $ backup/ internet_route_verification/data/irrs  main $ ll internet_route_verification/data/irrs/backup  main Permissions Size User Date Modified Git Name .rw-------@ 9.6M sichanghe 15 Jun 2023 -I  altdb.db .rw-------@ 40M sichanghe 15 Jun 2023 -I  arin.db .rw-------@ 551k sichanghe 15 Jun 2023 -I  bboi.db .rw-------@ 538k sichanghe 15 Jun 2023 -I  canarie.db .rw-------@ 4.3M sichanghe 15 Jun 2023 -I  jpirr.db .rw-------@ 2.3k sichanghe 15 Jun 2023 -I  nestegg.db .rw-------@ 194M sichanghe 15 Jun 2023 -I  nttcom.db .rw-------@ 11k sichanghe 15 Jun 2023 -I  panix.db $ rm bboi.db canarie.db nestegg.db panix.db internet_route_verification/data/irrs/backup  main 67ms $ ll internet_route_verification/data/irrs/backup  main Permissions Size User Date Modified Git Name .rw-------@ 9.6M sichanghe 15 Jun 2023 -I  altdb.db .rw-------@ 40M sichanghe 15 Jun 2023 -I  arin.db .rw-------@ 4.3M sichanghe 15 Jun 2023 -I  jpirr.db .rw-------@ 194M sichanghe 15 Jun 2023 -I  nttcom.db $ ../priority/ internet_route_verification/data/irrs/backup  main $ ll internet_route_verification/data/irrs/priority  main Permissions Size User Date Modified Git Name .rw-------@ 135M sichanghe 15 Jun 2023 -I  afrinic.db .rw-------@ 9.4M sichanghe 23 Jun 2023 -I  altdb.db .rw-------@ 127k sichanghe 15 Jun 2023 -I  apnic.db.as-block .rw-------@ 1.8M sichanghe 15 Jun 2023 -I  apnic.db.as-set .rw-------@ 11M sichanghe 15 Jun 2023 -I  apnic.db.aut-num .rw-------@ 90M sichanghe 15 Jun 2023 -I  apnic.db.domain .rw-------@ 5.1k sichanghe 15 Jun 2023 -I  apnic.db.filter-set .rw-------@ 6.8k sichanghe 15 Jun 2023 -I  apnic.db.inet-rtr .rw-------@ 50M sichanghe 15 Jun 2023 -I  apnic.db.inet6num .rw-------@ 504M sichanghe 15 Jun 2023 -I  apnic.db.inetnum .rw-------@ 10M sichanghe 15 Jun 2023 -I  apnic.db.irt .rw-------@ 741k sichanghe 15 Jun 2023 -I  apnic.db.key-cert .rw-------@ 1.9k sichanghe 15 Jun 2023 -I  apnic.db.limerick .rw-------@ 13M sichanghe 15 Jun 2023 -I  apnic.db.mntner .rw-------@ 5.6M sichanghe 15 Jun 2023 -I  apnic.db.organisation .rw-------@ 12k sichanghe 15 Jun 2023 -I  apnic.db.peering-set .rw-------@ 15M sichanghe 15 Jun 2023 -I  apnic.db.role .rw-------@ 189M sichanghe 15 Jun 2023 -I  apnic.db.route .rw-------@ 177k sichanghe 15 Jun 2023 -I  apnic.db.route-set .rw-------@ 85M sichanghe 15 Jun 2023 -I  apnic.db.route6 .rw-------@ 1.6k sichanghe 15 Jun 2023 -I  apnic.db.rtr-set .rw-------@ 35M sichanghe 23 Jun 2023 -I  arin.db .rw-------@ 507k sichanghe 23 Jun 2023 -I  bboi.db .rw-------@ 4.2M sichanghe 15 Jun 2023 -I  bell.db .rw-r--r--@ 535k sichanghe 23 Jun 2023 -I  canarie.db .rw-------@ 895 sichanghe 15 Jun 2023 -I  host.db .rw-------@ 4.9M sichanghe 15 Jun 2023 -I  idnic.db .rw-------@ 3.9M sichanghe 23 Jun 2023 -I  jpirr.db .rw-------@ 12M sichanghe 15 Jun 2023 -I  lacnic.db .rw-------@ 70M sichanghe 15 Jun 2023 -I  level3.db .rw-------@ 2.3k sichanghe 23 Jun 2023 -I  nestegg.db .rw-------@ 185M sichanghe 23 Jun 2023 -I  nttcom.db .rw-------@ 9.0k sichanghe 15 Jun 2023 -I  openface.db .rw-------@ 11k sichanghe 23 Jun 2023 -I  panix.db .rw-------@ 511M sichanghe 15 Jun 2023 -I  radb.db .rw-------@ 16M sichanghe 15 Jun 2023 -I  reach.db .rw-------@ 14k sichanghe 15 Jun 2023 -I  rgnet.db .rw-------@ 5.4G sichanghe 15 Jun 2023 -I  ripe.db .rw-------@ 17M sichanghe 15 Jun 2023 -I  tc.db $ rm bboi.db bell.db canarie.db host.db nestegg.db openface.db panix.db rgnet.db internet_route_verification/data/irrs/priority  main ```

Full log: 0102parse_log.txt

Summary
    Parsed 78701 aut_nums, 59597 as_sets, 24460 route_sets, 342 peering_sets, 202 filter_sets, 87418 as_routes (1475331 routes).
    29 skips during lexing, 545 syntax errors, 267 unknown path attributes, 23 invalid names parsing AS Sets, 22 invalid Route Set names, 104 complex PeerAS.
SichangHe commented 10 months ago

Somehow, removing non-major IRRs lost us a lot of routes (2020379 → 1475331), @cunha.

SichangHe commented 10 months ago
Update 2: Removed mirrors from RADB and applied priority to only authoritative registries. ```sh $ lt internet_route_verification/data/irrs  . ├──  backup │ ├──  altdb.db │ ├──  idnic.db │ ├──  jpirr.db │ ├──  level3.db │ ├──  nttcom.db │ ├──  radb.db │ ├──  reach.db │ └──  tc.db └──  priority ├──  afrinic.db ├──  apnic.db.as-block ├──  apnic.db.as-set ├──  apnic.db.aut-num ├──  apnic.db.domain ├──  apnic.db.filter-set ├──  apnic.db.inet-rtr ├──  apnic.db.inet6num ├──  apnic.db.inetnum ├──  apnic.db.irt ├──  apnic.db.key-cert ├──  apnic.db.limerick ├──  apnic.db.mntner ├──  apnic.db.organisation ├──  apnic.db.peering-set ├──  apnic.db.role ├──  apnic.db.route ├──  apnic.db.route-set ├──  apnic.db.route6 ├──  apnic.db.rtr-set ├──  arin.db ├──  lacnic.db └──  ripe.db ```

Full log: 0109parse_log.txt

Summary
    Parsed 78701 aut_nums, 59596 as_sets, 24459 route_sets, 342 peering_sets, 202 filter_sets, 87414 as_routes (1464705 routes).
    29 skips during lexing, 412 syntax errors, 251 unknown path attributes, 12 invalid names parsing AS Sets, 17 invalid Route Set names, 101 complex PeerAS.

Within the 29 skips, 7 are large objects, 22 are complex REFINE.

SichangHe commented 10 months ago

PR: https://github.com/SichangHe/internet_route_verification/pull/120

Update 3. ```sh $ lt internet_route_verification/data/irrs  . ├──  backup │ ├──  altdb.db │ ├──  idnic.db │ ├──  jpirr.db │ ├──  level3.db │ ├──  nttcom.db │ ├──  reach.db │ └──  tc.db ├──  priority │ ├──  afrinic.db │ ├──  apnic.db.as-block │ ├──  apnic.db.as-set │ ├──  apnic.db.aut-num │ ├──  apnic.db.domain │ ├──  apnic.db.filter-set │ ├──  apnic.db.inet-rtr │ ├──  apnic.db.inet6num │ ├──  apnic.db.inetnum │ ├──  apnic.db.irt │ ├──  apnic.db.key-cert │ ├──  apnic.db.limerick │ ├──  apnic.db.mntner │ ├──  apnic.db.organisation │ ├──  apnic.db.peering-set │ ├──  apnic.db.role │ ├──  apnic.db.route │ ├──  apnic.db.route-set │ ├──  apnic.db.route6 │ ├──  apnic.db.rtr-set │ ├──  arin.db │ ├──  lacnic.db │ └──  ripe.db └──  second_priority └──  radb.db ```
Summary
    Parsed 78701 aut_nums, 59596 as_sets, 24459 route_sets, 342 peering_sets, 202 filter_sets, 87414 as_routes (3367914 routes).
    29 skips during lexing, 412 syntax errors, 251 unknown path attributes, 12 invalid names parsing AS Sets, 17 invalid Route Set names, 101 complex PeerAS.

Only the number of routes changed (more than doubled).

SichangHe commented 9 months ago
Update 4: got rid of all lexer skips. Commit: https://github.com/SichangHe/internet_route_verification/commit/6645358939144d09ec4dc7a440b069129e0f7f0c ```sh $ lt internet_route_verification/data/irrs  main  . ├──  backup │ ├──  altdb.db │ ├──  idnic.db │ ├──  jpirr.db │ ├──  level3.db │ ├──  nttcom.db │ ├──  radb.db │ ├──  reach.db │ └──  tc.db └──  priority ├──  afrinic.db ├──  apnic.db.as-block ├──  apnic.db.as-set ├──  apnic.db.aut-num ├──  apnic.db.domain ├──  apnic.db.filter-set ├──  apnic.db.inet-rtr ├──  apnic.db.inet6num ├──  apnic.db.inetnum ├──  apnic.db.irt ├──  apnic.db.key-cert ├──  apnic.db.limerick ├──  apnic.db.mntner ├──  apnic.db.organisation ├──  apnic.db.peering-set ├──  apnic.db.role ├──  apnic.db.route ├──  apnic.db.route-set ├──  apnic.db.route6 ├──  apnic.db.rtr-set ├──  arin.db ├──  lacnic.db └──  ripe.db ``` [011715_parse_log.txt](https://github.com/SichangHe/internet_route_verification/files/13960334/011715_parse_log.txt)
Summary
    Parsed 78701 aut_nums, 59596 as_sets, 24460 route_sets, 342 peering_sets, 203 filter_sets, 87414 as_routes (3367914 routes).
    412 syntax errors, 251 unknown path attributes in filter, 12 invalid names parsing AS Sets, 17 invalid Route Set names, 104 complex PeerAS.
SichangHe commented 6 months ago

After moving PeerAS resolution to run time:

Parsed 78701 aut_nums, 59596 as_sets, 24460 route_sets, 342 peering_sets, 203 filter_sets, 87414 as_routes (3367914 routes).
412 syntax errors, 251 unknown path attributes in filter, 12 invalid names parsing AS Sets, 17 invalid Route Set names.