SichangHe commented 1 year ago

We have generated all the reports for rib.20230619.2200.bz2 of 104MiB with the lowest verbosity.

These conclusions were simply wrong because I was counting the first 10,000.

The routes with errors among ~all the routes~ *the first 10,000 routes* are 64633 / 26447600 = 0.002443813427305313. The memory usage sits comfortably at 27GiB. With this kind of circumstances, I think we can start to look into the result and get some statistics. However, the report data are doubly-nested, so we would need to either somehow flatten them or loop through them, with the latter option seemingly simpler.

SichangHe commented 1 year ago

What we can get by looping through the data:

[x] Total number of routes: Bad: 23724704, neutral: 2568654, good: 154242, total: 26447600.
[x] Number of erroneous/skipped/compliant import/export for each AS.
[ ] Number of routes of each skip/error type (update July 9).
- Number of import/export error for each AS: import_export_err_per_as.csv
  
  31337253 import errors, 31157529 export errors
  *Dataframe 1* (AS name, import errors, export errors): ```elixir ┌─────┬─────┬──────┬──────┬───┬──────────┬──────────┬──────────┬──────────┐ │ AS4 ┆ AS8 ┆ AS12 ┆ AS18 ┆ … ┆ AS400810 ┆ AS400818 ┆ AS400856 ┆ AS401307 │ │ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │ │ i32 ┆ i32 ┆ i32 ┆ i32 ┆ ┆ i32 ┆ i32 ┆ i32 ┆ i32 │ ╞═════╪═════╪══════╪══════╪═══╪══════════╪══════════╪══════════╪══════════╡ │ 7 ┆ 0 ┆ 224 ┆ 0 ┆ … ┆ 0 ┆ 0 ┆ 0 ┆ 0 │ │ 924 ┆ 203 ┆ 439 ┆ 231 ┆ … ┆ 252 ┆ 29 ┆ 56 ┆ 3720 │ └─────┴─────┴──────┴──────┴───┴──────────┴──────────┴──────────┴──────────┘ ``` Description for *Dataframe 1*: ```elixir ┌───────────┬───────────┬───────────┬───────────┬───┬───────────┬───────────┬──────────┬───────────┐ │ describe ┆ AS4 ┆ AS8 ┆ AS12 ┆ … ┆ AS400810 ┆ AS400818 ┆ AS400856 ┆ AS401307 │ │ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ f64 ┆ f64 ┆ f64 ┆ ┆ f64 ┆ f64 ┆ f64 ┆ f64 │ ╞═══════════╪═══════════╪═══════════╪═══════════╪═══╪═══════════╪═══════════╪══════════╪═══════════╡ │ count ┆ 2.0 ┆ 2.0 ┆ 2.0 ┆ … ┆ 2.0 ┆ 2.0 ┆ 2.0 ┆ 2.0 │ │ null_coun ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ … ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 │ │ t ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ │ │ mean ┆ 465.5 ┆ 101.5 ┆ 331.5 ┆ … ┆ 126.0 ┆ 14.5 ┆ 28.0 ┆ 1860.0 │ │ std ┆ 648.41691 ┆ 143.54267 ┆ 152.02795 ┆ … ┆ 178.19090 ┆ 20.506097 ┆ 39.59798 ┆ 2630.4372 │ │ ┆ 8 ┆ 7 ┆ 8 ┆ ┆ 9 ┆ ┆ ┆ 26 │ │ min ┆ 7.0 ┆ 0.0 ┆ 224.0 ┆ … ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 │ │ 25% ┆ 236.25 ┆ 50.75 ┆ 277.75 ┆ … ┆ 63.0 ┆ 7.25 ┆ 14.0 ┆ 930.0 │ │ 50% ┆ 465.5 ┆ 101.5 ┆ 331.5 ┆ … ┆ 126.0 ┆ 14.5 ┆ 28.0 ┆ 1860.0 │ │ 75% ┆ 694.75 ┆ 152.25 ┆ 385.25 ┆ … ┆ 189.0 ┆ 21.75 ┆ 42.0 ┆ 2790.0 │ │ max ┆ 924.0 ┆ 203.0 ┆ 439.0 ┆ … ┆ 252.0 ┆ 29.0 ┆ 56.0 ┆ 3720.0 │ └───────────┴───────────┴───────────┴───────────┴───┴───────────┴───────────┴──────────┴───────────┘ ``` Description for transposed *Dataframe 1*: ```elixir ┌────────────┬──────────────┬──────────────┐ │ describe ┆ column_0 ┆ column_1 │ │ --- ┆ --- ┆ --- │ │ str ┆ f64 ┆ f64 │ ╞════════════╪══════════════╪══════════════╡ │ count ┆ 23933.0 ┆ 23933.0 │ │ null_count ┆ 0.0 ┆ 0.0 │ │ mean ┆ 1309.374211 ┆ 1301.864747 │ │ std ┆ 52564.363811 ┆ 48488.717931 │ │ min ┆ 0.0 ┆ 0.0 │ │ 25% ┆ 0.0 ┆ 28.0 │ │ 50% ┆ 0.0 ┆ 58.0 │ │ 75% ┆ 0.0 ┆ 168.0 │ │ max ┆ 5.152725e6 ┆ 5.43163e6 │ └────────────┴──────────────┴──────────────┘ ```

SichangHe commented 1 year ago

Strangely, the number of errors decreased after I improved the speed by ~30x. I am guessing that it is because the previous implementation induced more RecursionError.

cunha commented 1 year ago

On the import/export errors, it may also help compute the relative fraction of them (i.e., divide the absolute number of errors by the number of paths).

SichangHe commented 1 year ago

Stats per AS

Generated stats for 75377 AS in 2600168ms.
shape: (75_377, 7)
┌─────────┬───────────┬───────────┬─────────────┬─────────────┬────────────┬────────────┐
│ aut_num ┆ import_ok ┆ export_ok ┆ import_skip ┆ export_skip ┆ import_err ┆ export_err │
│ ---     ┆ ---       ┆ ---       ┆ ---         ┆ ---         ┆ ---        ┆ ---        │
│ u64     ┆ u32       ┆ u32       ┆ u32         ┆ u32         ┆ u32        ┆ u32        │
╞═════════╪═══════════╪═══════════╪═════════════╪═════════════╪════════════╪════════════╡
│ 200297  ┆ 0         ┆ 116       ┆ 0           ┆ 0           ┆ 0          ┆ 0          │
│ 40458   ┆ 0         ┆ 0         ┆ 0           ┆ 168         ┆ 0          ┆ 0          │
│ 399407  ┆ 0         ┆ 0         ┆ 0           ┆ 29          ┆ 0          ┆ 0          │
│ 212963  ┆ 0         ┆ 28        ┆ 0           ┆ 0           ┆ 0          ┆ 0          │
│ …       ┆ …         ┆ …         ┆ …           ┆ …           ┆ …          ┆ …          │
│ 393929  ┆ 0         ┆ 0         ┆ 0           ┆ 56          ┆ 0          ┆ 0          │
│ 9394    ┆ 0         ┆ 0         ┆ 8020        ┆ 31774       ┆ 0          ┆ 0          │
│ 271107  ┆ 0         ┆ 0         ┆ 0           ┆ 93          ┆ 0          ┆ 0          │
│ 139609  ┆ 953       ┆ 0         ┆ 0           ┆ 0           ┆ 0          ┆ 1009       │
└─────────┴───────────┴───────────┴─────────────┴─────────────┴────────────┴────────────┘
shape: (9, 8)
┌─────────────┬──────────┬────────────┬────────────┬───────────┬───────────┬──────────┬────────────┐
│ describe    ┆ aut_num  ┆ import_ok  ┆ export_ok  ┆ import_sk ┆ export_sk ┆ import_e ┆ export_err │
│ ---         ┆ ---      ┆ ---        ┆ ---        ┆ ip        ┆ ip        ┆ rr       ┆ ---        │
│ str         ┆ f64      ┆ f64        ┆ f64        ┆ ---       ┆ ---       ┆ ---      ┆ f64        │
│             ┆          ┆            ┆            ┆ f64       ┆ f64       ┆ f64      ┆            │
╞═════════════╪══════════╪════════════╪════════════╪═══════════╪═══════════╪══════════╪════════════╡
│ mean        ┆ 2.0236e6 ┆ 192.3249   ┆ 142.987145 ┆ 411.20168 ┆ 463.04428 ┆ 415.7402 ┆ 413.355918 │
│             ┆          ┆            ┆            ┆           ┆ 4         ┆ 52       ┆            │
│ std         ┆ 8.9239e7 ┆ 10470.6907 ┆ 6118.37982 ┆ 19042.364 ┆ 15488.158 ┆ 29624.86 ┆ 27328.7938 │
│             ┆          ┆ 84         ┆ 5          ┆ 194       ┆ 847       ┆ 0205     ┆ 67         │
│ min         ┆ 1.0      ┆ 0.0        ┆ 0.0        ┆ 0.0       ┆ 0.0       ┆ 0.0      ┆ 0.0        │
│ 25%         ┆ 33601.0  ┆ 0.0        ┆ 0.0        ┆ 0.0       ┆ 0.0       ┆ 0.0      ┆ 0.0        │
│ 50%         ┆ 60762.0  ┆ 0.0        ┆ 0.0        ┆ 0.0       ┆ 28.0      ┆ 0.0      ┆ 0.0        │
│ 75%         ┆ 207307.0 ┆ 0.0        ┆ 29.0       ┆ 0.0       ┆ 87.0      ┆ 0.0      ┆ 28.0       │
│ max         ┆ 4.2926e9 ┆ 1.53399e6  ┆ 1.078456e6 ┆ 2.222287e ┆ 2.80516e6 ┆ 5.152725 ┆ 5.43163e6  │
│             ┆          ┆            ┆            ┆ 6         ┆           ┆ e6       ┆            │
└─────────────┴──────────┴────────────┴────────────┴───────────┴───────────┴──────────┴────────────┘

as_stats.csv

cunha commented 1 year ago

Nice. Here is an example we could investigate to try and see what's up:

│ aut_num ┆ import_ok ┆ export_ok ┆ import_skip ┆ export_skip ┆ import_err ┆ export_err │

| 139609 ┆ 953 ┆ 0 ┆ 0 ┆ 0 ┆ 0 ┆ 1009

On Thu, Jul 13, 2023 at 4:52 AM Steven Hé (Sīchàng) < @.***> wrote:

Stats per AS

Generated stats for 75377 AS in 2600168ms.shape: (75_377, 7) ┌─────────┬───────────┬───────────┬─────────────┬─────────────┬────────────┬────────────┐ │ aut_num ┆ import_ok ┆ export_ok ┆ import_skip ┆ export_skip ┆ import_err ┆ export_err │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ u64 ┆ u32 ┆ u32 ┆ u32 ┆ u32 ┆ u32 ┆ u32 │ ╞═════════╪═══════════╪═══════════╪═════════════╪═════════════╪════════════╪════════════╡ │ 200297 ┆ 0 ┆ 116 ┆ 0 ┆ 0 ┆ 0 ┆ 0 │ │ 40458 ┆ 0 ┆ 0 ┆ 0 ┆ 168 ┆ 0 ┆ 0 │ │ 399407 ┆ 0 ┆ 0 ┆ 0 ┆ 29 ┆ 0 ┆ 0 │ │ 212963 ┆ 0 ┆ 28 ┆ 0 ┆ 0 ┆ 0 ┆ 0 │ │ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │ │ 393929 ┆ 0 ┆ 0 ┆ 0 ┆ 56 ┆ 0 ┆ 0 │ │ 9394 ┆ 0 ┆ 0 ┆ 8020 ┆ 31774 ┆ 0 ┆ 0 │ │ 271107 ┆ 0 ┆ 0 ┆ 0 ┆ 93 ┆ 0 ┆ 0 │ │ 139609 ┆ 953 ┆ 0 ┆ 0 ┆ 0 ┆ 0 ┆ 1009 │ └─────────┴───────────┴───────────┴─────────────┴─────────────┴────────────┴────────────┘shape: (9, 8) ┌─────────────┬──────────┬────────────┬────────────┬───────────┬───────────┬──────────┬────────────┐ │ describe ┆ aut_num ┆ import_ok ┆ export_ok ┆ import_sk ┆ export_sk ┆ import_e ┆ export_err │ │ --- ┆ --- ┆ --- ┆ --- ┆ ip ┆ ip ┆ rr ┆ --- │ │ str ┆ f64 ┆ f64 ┆ f64 ┆ --- ┆ --- ┆ --- ┆ f64 │ │ ┆ ┆ ┆ ┆ f64 ┆ f64 ┆ f64 ┆ │ ╞═════════════╪══════════╪════════════╪════════════╪═══════════╪═══════════╪══════════╪════════════╡ │ mean ┆ 2.0236e6 ┆ 192.3249 ┆ 142.987145 ┆ 411.20168 ┆ 463.04428 ┆ 415.7402 ┆ 413.355918 │ │ ┆ ┆ ┆ ┆ ┆ 4 ┆ 52 ┆ │ │ std ┆ 8.9239e7 ┆ 10470.6907 ┆ 6118.37982 ┆ 19042.364 ┆ 15488.158 ┆ 29624.86 ┆ 27328.7938 │ │ ┆ ┆ 84 ┆ 5 ┆ 194 ┆ 847 ┆ 0205 ┆ 67 │ │ min ┆ 1.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 │ │ 25% ┆ 33601.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 │ │ 50% ┆ 60762.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 28.0 ┆ 0.0 ┆ 0.0 │ │ 75% ┆ 207307.0 ┆ 0.0 ┆ 29.0 ┆ 0.0 ┆ 87.0 ┆ 0.0 ┆ 28.0 │ │ max ┆ 4.2926e9 ┆ 1.53399e6 ┆ 1.078456e6 ┆ 2.222287e ┆ 2.80516e6 ┆ 5.152725 ┆ 5.43163e6 │ │ ┆ ┆ ┆ ┆ 6 ┆ ┆ e6 ┆ │ └─────────────┴──────────┴────────────┴────────────┴───────────┴───────────┴──────────┴────────────┘

as_stats.csv https://github.com/SichangHe/internet_route_verification/files/12036788/as_stats.csv

— Reply to this email directly, view it on GitHub https://github.com/SichangHe/internet_route_verification/issues/21#issuecomment-1633745179, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACPO552KY5TCVTN4XOBWRDXP6SK3ANCNFSM6AAAAAA2BQIDPM . You are receiving this because you commented.Message ID: @.***>

SichangHe commented 1 year ago

[x] Filter on AS to retrieve reports.

SichangHe commented 1 year ago

Stats on up/down-hill

shape: (24, 4)
┌─────────┬───────┬────────┬──────────┐
│ quality ┆ hill  ┆ port   ┆ value    │
│ ---     ┆ ---   ┆ ---    ┆ ---      │
│ str     ┆ str   ┆ str    ┆ u32      │
╞═════════╪═══════╪════════╪══════════╡
│ good    ┆ up    ┆ import ┆ 6073000  │
│ good    ┆ down  ┆ import ┆ 7330086  │
│ good    ┆ peer  ┆ import ┆ 1072993  │
│ good    ┆ other ┆ import ┆ 20795    │
│ good    ┆ up    ┆ export ┆ 6658732  │
│ good    ┆ down  ┆ export ┆ 3125541  │
│ good    ┆ peer  ┆ export ┆ 959414   │
│ good    ┆ other ┆ export ┆ 34255    │
│ neutral ┆ up    ┆ import ┆ 16783070 │
│ neutral ┆ down  ┆ import ┆ 6520591  │
│ neutral ┆ peer  ┆ import ┆ 7543136  │
│ neutral ┆ other ┆ import ┆ 148352   │
│ neutral ┆ up    ┆ export ┆ 20863157 │
│ neutral ┆ down  ┆ export ┆ 5883454  │
│ neutral ┆ peer  ┆ export ┆ 7879843  │
│ neutral ┆ other ┆ export ┆ 276435   │
│ bad     ┆ up    ┆ import ┆ 18039677 │
│ bad     ┆ down  ┆ import ┆ 2840992  │
│ bad     ┆ peer  ┆ import ┆ 9854886  │
│ bad     ┆ other ┆ import ┆ 601698   │
│ bad     ┆ up    ┆ export ┆ 13373858 │
│ bad     ┆ down  ┆ export ┆ 7682674  │
│ bad     ┆ peer  ┆ export ┆ 9631758  │
│ bad     ┆ other ┆ export ┆ 469239   │
└─────────┴───────┴────────┴──────────┘

up_down_hill_stats.csv

Struct debug print.

```elixir UpDownHillStats { good_up_import: 6073000, good_down_import: 7330086, good_peer_import: 1072993, good_other_import: 20795, good_up_export: 6658732, good_down_export: 3125541, good_peer_export: 959414, good_other_export: 34255, neutral_up_import: 16783070, neutral_down_import: 6520591, neutral_peer_import: 7543136, neutral_other_import: 148352, neutral_up_export: 20863157, neutral_down_export: 5883454, neutral_peer_export: 7879843, neutral_other_export: 276435, bad_up_import: 18039677, bad_down_import: 2840992, bad_peer_import: 9854886, bad_other_import: 601698, bad_up_export: 13373858, bad_down_export: 7682674, bad_peer_export: 9631758, bad_other_export: 469239, } ```

SichangHe commented 1 year ago

Percentage data about the above up/down-hill stats.

```python quality hill port value %total %quality_hill %quality_port %hill_port %quality %hill %port 0 good up import 6,073,000 4.0 24.0 24.0 7.4 4.0 7.4 7.9 1 good down import 7,330,086 4.8 29.0 29.0 22.0 4.8 22.0 9.5 2 good peer import 1,072,993 0.7 4.2 4.2 2.9 0.7 2.9 1.4 3 good other import 20,795 0.0 0.1 0.1 1.3 0.0 1.3 0.0 4 good up export 6,658,732 4.3 26.3 26.3 8.1 4.3 8.1 8.7 5 good down export 3,125,541 2.0 12.4 12.4 9.4 2.0 9.4 4.1 6 good peer export 959,414 0.6 3.8 3.8 2.6 0.6 2.6 1.2 7 good other export 34,255 0.0 0.1 0.1 2.2 0.0 2.2 0.0 8 neutral up import 16,783,070 10.9 25.5 25.5 20.5 10.9 20.5 21.8 9 neutral down import 6,520,591 4.2 9.9 9.9 19.5 4.2 19.5 8.5 10 neutral peer import 7,543,136 4.9 11.4 11.4 20.4 4.9 20.4 9.8 11 neutral other import 148,352 0.1 0.2 0.2 9.6 0.1 9.6 0.2 12 neutral up export 20,863,157 13.6 31.7 31.7 25.5 13.6 25.5 27.2 13 neutral down export 5,883,454 3.8 8.9 8.9 17.6 3.8 17.6 7.7 14 neutral peer export 7,879,843 5.1 12.0 12.0 21.3 5.1 21.3 10.3 15 neutral other export 276,435 0.2 0.4 0.4 17.8 0.2 17.8 0.4 16 bad up import 18,039,677 11.7 28.9 28.9 22.1 11.7 22.1 23.5 17 bad down import 2,840,992 1.8 4.5 4.5 8.5 1.8 8.5 3.7 18 bad peer import 9,854,886 6.4 15.8 15.8 26.7 6.4 26.7 12.8 19 bad other import 601,698 0.4 1.0 1.0 38.8 0.4 38.8 0.8 20 bad up export 13,373,858 8.7 21.4 21.4 16.4 8.7 16.4 17.4 21 bad down export 7,682,674 5.0 12.3 12.3 23.0 5.0 23.0 10.0 22 bad peer export 9,631,758 6.3 15.4 15.4 26.1 6.3 26.1 12.5 23 bad other export 469,239 0.3 0.8 0.8 30.3 0.3 30.3 0.6 ```

SichangHe commented 1 year ago

There clearly are much more errors going uphill, looking at %quality_port (percentage among the values of the same quality and port).

The problem is that there are also many errors when going flat (P2P), and there will still be 10M errors going downhill for both import and export (check out the edit history of the last comment for "total"s).

cunha commented 1 year ago

What are the cases where hill == other? (For AS-pairs that are not in CAIDA's database, we could assume a P2P relationship.)

This is a neat analysis, helps us focus. One possible next step is to identify the AS-pairs resulting in the bad cases, then sort the ASes by the number of pairs they appear in. These ASes with many violations seem like good ones to look for special cases.

SichangHe commented 1 year ago

What are the cases where hill == other? (For AS-pairs that are not in CAIDA's database, we could assume a P2P relationship.)

They are either not in CAIDA's database, or "SingleExport," where only one AS is in the AS path.

Source code location: https://github.com/SichangHe/internet_route_verification/blob/d0931bbb0b1e162849efc3dbbc8d04fafced6000/route_verification/bgp/src/stats.rs#L72

SichangHe commented 1 year ago

One possible next step is to identify the AS-pairs resulting in the bad cases, then sort the ASes by the number of pairs they appear in. These ASes with many violations seem like good ones to look for special cases.

Using the stats per AS:

Top 20 ASes by import_err

```ruby aut_num import_ok export_ok import_skip export_skip import_err export_err 30506 6939 0 6 0 229 5152725 4222643 73677 3356 287354 1078456 1323609 0 4877822 5431630 34955 174 0 0 0 0 2233104 2291696 1522 2914 30498 5 755 227 2041237 1168424 44630 3257 399658 986036 659426 0 1493424 679879 19807 23673 3329 0 0 0 953407 3851 11109 20130 0 3 0 0 939442 90 71453 22652 160791 1305 73354 2545 685243 4635 47585 9498 88 0 0 0 492179 575489 7660 57463 767 16 529 0 461814 32236 28045 37100 516087 18160 0 10749 426190 0 50958 18106 536889 373 0 1779 409554 1238 60929 1299 1075777 113 2222287 2805160 397270 0 74062 4755 0 0 0 0 338192 398635 49163 6762 142658 213486 154426 283519 329166 133401 31876 34224 616660 4035 19898 4450 312004 2976 49084 3130 1533990 8 0 78 299350 38 36780 9808 254 1120 0 0 275988 541657 22356 12956 78974 37925 29201 327558 256043 0 67329 12552 169146 14658 0 379379 225716 9754 ```

Top 20 ASes by export_err

```ruby aut_num import_ok export_ok import_skip export_skip import_err export_err 73677 3356 287354 1078456 1323609 0 4877822 5431630 30506 6939 0 6 0 229 5152725 4222643 34955 174 0 0 0 0 2233104 2291696 1522 2914 30498 5 755 227 2041237 1168424 44630 3257 399658 986036 659426 0 1493424 679879 45030 58453 519149 4 0 0 67523 588562 47585 9498 88 0 0 0 492179 575489 36780 9808 254 1120 0 0 275988 541657 74062 4755 0 0 0 0 338192 398635 36811 16509 0 0 0 0 24324 268318 53015 20940 0 75 0 840 119796 233824 42355 4766 0 0 0 0 150573 219008 35903 12479 2429 3091 227 0 5432 209431 17048 4837 0 0 0 0 171847 197067 21282 23520 2521 3 4775 0 186221 196463 10625 25019 0 140 0 0 143698 177343 54049 3786 2 0 0 0 123053 147528 42483 11164 0 1165 147733 460 0 146156 1407 9318 206 727 0 0 117561 136978 44952 7552 22 0 0 0 33307 134230 ```

cunha commented 1 year ago

All networks in the top5 are Tier-1s (and possibly others). This might make it easy for us to special case them.

One possibly important difference between counting the number of errors and the number of AS-pairs in which an AS is involved is that the absolute number of errors depend on the number of paths, which may bias results toward "central" networks (like the Tier-1s).

SichangHe commented 1 year ago

One possibly important difference between counting the number of errors and the number of AS-pairs in which an AS is involved is that the absolute number of errors depend on the number of paths, which may bias results toward "central" networks (like the Tier-1s).

I'm not sure what you are talking about here in terms of "the number of errors."

The data above are about AS pairs.

SichangHe commented 1 year ago

All networks in the top5 are Tier-1s (and possibly others). This might make it easy for us to special case them.

Maybe. Unfortunately, by nature, having more routes through these Tier-1's naturally causes more errors on average.

SichangHe commented 1 year ago

Ranking by the percentage of errors brings poor results as some ASes simple have all their routes bad.

Top 20 ASes by percentage of export_err.

```ruby aut_num import_ok export_ok import_skip export_skip import_err export_err %import_err %export_err 11109 20130 0 3 0 0 939442 90 99.979683 0.009578 72983 53767 0 153 0 0 156474 41 99.828778 0.026158 19807 23673 3329 0 0 0 953407 3851 99.242492 0.400860 7660 57463 767 16 529 0 461814 32236 93.210763 6.506390 71453 22652 160791 1305 73354 2545 685243 4635 73.845171 0.499490 35972 11537 1475 11852 0 0 35325 1759 69.990878 3.485179 1522 2914 30498 5 755 227 2041237 1168424 62.976944 36.048618 30506 6939 0 6 0 229 5152725 4222643 54.958279 45.038148 53139 273686 0 0 0 28 28 0 49.990869 0.000000 37897 272724 0 0 0 58 58 0 49.990839 0.000000 25150 272707 0 0 0 44 44 0 49.990837 0.000000 46410 272618 0 0 0 28 28 0 49.990833 0.000000 55162 272221 0 0 0 29 29 0 49.990820 0.000000 75047 270338 0 30 0 0 30 0 49.990756 0.000000 22169 272688 0 29 0 0 30 1 49.990530 1.666351 32085 272559 0 29 0 0 30 1 49.990526 1.666351 30141 272451 0 0 0 28 29 1 49.990511 1.723811 28 271673 0 0 0 30 31 1 49.990505 1.612597 13910 272193 0 18 0 452 609 139 49.988772 11.409588 44449 272161 0 0 0 25 33 8 49.988593 12.118447 ```

Top 20 ASes by percentage of import_err.

```ruby aut_num import_ok export_ok import_skip export_skip import_err export_err %import_err %export_err 59012 400735 0 0 0 0 0 120 0.0 99.975060 47760 400818 0 0 0 0 0 29 0.0 99.975059 24956 400771 0 0 0 0 0 30 0.0 99.975056 30321 400757 0 0 0 0 0 30 0.0 99.975055 57343 400719 0 0 0 0 0 60 0.0 99.975055 111 400718 0 0 0 0 0 16 0.0 99.975052 68255 400642 0 0 0 0 0 90 0.0 99.975052 48938 400679 0 0 0 0 0 40 0.0 99.975051 37292 400587 0 0 0 0 0 60 0.0 99.975047 47963 400541 0 0 0 0 0 60 0.0 99.975044 66270 400404 0 0 0 0 0 197 0.0 99.975044 30442 400555 0 0 0 0 0 30 0.0 99.975043 25257 400554 0 0 0 0 0 28 0.0 99.975043 10688 400030 0 0 0 0 0 543 0.0 99.975042 58061 399007 0 0 0 0 0 1504 0.0 99.975038 15052 400474 0 0 0 0 0 30 0.0 99.975038 2545 400469 0 0 0 0 0 28 0.0 99.975037 16801 400443 0 0 0 0 0 30 0.0 99.975036 46506 400414 0 0 0 0 0 38 0.0 99.975034 478 399930 0 0 0 0 0 504 0.0 99.975033 ```

cunha commented 1 year ago

I'm not sure what you are talking about here in terms of "the number of errors."

The data above are about AS pairs.

Maybe I'm reading it wrong. For example, we have 5M import errors for 6939, but if we were counting the number of AS-pairs with import errors, then it couldn't be much larger than 80K as that is the number of ASes in the Internet.

What I have in mind is that we may have X paths with import/export errors at 6939. These X errors may involve Y neighboring ASes (either before or after 6939 on the AS-path), with Y likely much smaller than X.

SichangHe commented 1 year ago

AS pair stats

Polars output in Evcxr

. ```elixir Generated stats of 176636 AS pairs in 1713732ms. shape: (176_636, 9) ┌────────┬───────┬───────────┬───────────┬─────────────┬─────────────┬────────────┬────────────┬──────────────┐ │ from ┆ to ┆ import_ok ┆ export_ok ┆ import_skip ┆ export_skip ┆ import_err ┆ export_err ┆ relationship │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ u64 ┆ u64 ┆ u32 ┆ u32 ┆ u32 ┆ u32 ┆ u32 ┆ u32 ┆ str │ ╞════════╪═══════╪═══════════╪═══════════╪═════════════╪═════════════╪════════════╪════════════╪══════════════╡ │ 31631 ┆ 174 ┆ 0 ┆ 0 ┆ 0 ┆ 0 ┆ 59 ┆ 59 ┆ up │ │ 14840 ┆ 31133 ┆ 0 ┆ 0 ┆ 0 ┆ 0 ┆ 690 ┆ 690 ┆ peer │ │ 43927 ┆ 8708 ┆ 0 ┆ 0 ┆ 54 ┆ 0 ┆ 0 ┆ 54 ┆ up │ │ 328293 ┆ 37288 ┆ 28 ┆ 0 ┆ 0 ┆ 28 ┆ 0 ┆ 0 ┆ up │ │ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │ │ 201967 ┆ 57388 ┆ 0 ┆ 30 ┆ 0 ┆ 0 ┆ 30 ┆ 0 ┆ up │ │ 11260 ┆ 22652 ┆ 0 ┆ 0 ┆ 0 ┆ 0 ┆ 145 ┆ 145 ┆ peer │ │ 51540 ┆ 61135 ┆ 662 ┆ 370 ┆ 0 ┆ 0 ┆ 0 ┆ 292 ┆ up │ │ 15317 ┆ 3356 ┆ 0 ┆ 0 ┆ 0 ┆ 136 ┆ 136 ┆ 0 ┆ up │ └────────┴───────┴───────────┴───────────┴─────────────┴─────────────┴────────────┴────────────┴──────────────┘ shape: (9, 10) ┌────────────┬───────────────┬───────────────┬─────────────┬─────────────┬─────────────┬─────────────┬─────────────┬─────────────┬──────────────┐ │ describe ┆ from ┆ to ┆ import_ok ┆ export_ok ┆ import_skip ┆ export_skip ┆ import_err ┆ export_err ┆ relationship │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ str │ ╞════════════╪═══════════════╪═══════════════╪═════════════╪═════════════╪═════════════╪═════════════╪═════════════╪═════════════╪══════════════╡ │ count ┆ 176636.0 ┆ 176636.0 ┆ 176636.0 ┆ 176636.0 ┆ 176636.0 ┆ 176636.0 ┆ 176636.0 ┆ 176636.0 ┆ 176636 │ │ null_count ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0 │ │ mean ┆ 967749.328868 ┆ 298711.471948 ┆ 82.072024 ┆ 61.007139 ┆ 175.474699 ┆ 197.55778 ┆ 177.411473 ┆ 176.393278 ┆ null │ │ std ┆ 5.9992e7 ┆ 3.2992e7 ┆ 4934.729273 ┆ 2670.044459 ┆ 6171.972071 ┆ 5402.60801 ┆ 4047.793217 ┆ 6539.543265 ┆ null │ │ min ┆ 1.0 ┆ 1.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ down │ │ 25% ┆ 26575.75 ┆ 5466.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ null │ │ 50% ┆ 52286.5 ┆ 12741.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ null │ │ 75% ┆ 202513.0 ┆ 37595.25 ┆ 0.0 ┆ 1.0 ┆ 28.0 ┆ 30.0 ┆ 29.0 ┆ 8.0 ┆ null │ │ max ┆ 4.2926e9 ┆ 4.2926e9 ┆ 1.533988e6 ┆ 749784.0 ┆ 1.794792e6 ┆ 1.533988e6 ┆ 816612.0 ┆ 1.794792e6 ┆ up │ └────────────┴───────────────┴───────────────┴─────────────┴─────────────┴─────────────┴─────────────┴─────────────┴─────────────┴──────────────┘ ```

as_pair_stats.csv

SichangHe commented 1 year ago

Analysis on AS pair stats

Here, only the number of pairs is considered, the number of successes/skips/errors in the routes are normed (collapsed to 1 for each pair).

The ASes with the most errors.

```python In [37]: df_from.sort_values(by='import_err', ascending=False).head(20) Out[37]: import_ok export_ok import_skip export_skip import_err export_err from 13335 21 0 133 269 121 0 54994 5 0 29 90 59 0 20940 18 2 89 2 49 140 24429 1 0 20 70 49 0 21859 0 0 36 2 43 77 21433 1 0 23 0 43 67 136907 1 0 29 72 42 0 6939 22 0 41 0 41 94 138915 2 1 14 0 40 55 55256 6 0 28 0 37 65 45102 1 0 10 0 35 46 12654 13 43 15 0 33 18 54113 2 0 24 0 32 56 132203 2 0 19 51 32 0 199524 12 0 31 63 31 0 139341 2 0 12 45 31 0 396986 0 0 10 0 31 41 19551 4 0 29 60 30 0 31898 2 0 18 49 30 0 262663 0 2 44 0 28 70 In [38]: df_from.sort_values(by='export_err', ascending=False).head(20) Out[38]: import_ok export_ok import_skip export_skip import_err export_err from 20940 18 2 89 2 49 140 6939 22 0 41 0 41 94 21859 0 0 36 2 43 77 174 25 0 33 0 21 76 262663 0 2 44 0 28 70 21433 1 0 23 0 43 67 55256 6 0 28 0 37 65 13150 3 2 28 0 28 57 54113 2 0 24 0 32 56 138915 2 1 14 0 40 55 3356 19 19 39 0 16 54 63293 2 0 30 0 22 54 20473 6 0 27 0 25 52 16509 2 0 26 0 22 50 45102 1 0 10 0 35 46 714 7 2 18 0 21 43 2914 14 0 24 0 10 42 396986 0 0 10 0 31 41 32934 9 0 21 0 19 41 42 6 0 19 0 20 40 In [39]: df_to.sort_values(by='import_err', ascending=False).head(20) Out[39]: import_ok export_ok import_skip export_skip import_err export_err to 6939 0 1615 0 4055 9524 4552 174 0 1768 0 3686 6381 1682 3356 341 1179 718 4605 5642 1232 57463 2 103 1 1589 4484 2842 34224 34 163 15 333 2146 1754 18106 4 70 0 524 1936 1393 2914 29 400 1 845 1486 479 12552 362 294 0 340 1255 1105 37100 7 105 0 396 979 555 9498 1 32 0 772 959 167 31133 481 448 77 240 887 760 20764 199 162 76 166 870 796 3257 1381 489 423 899 843 747 4755 0 30 0 571 740 155 8492 193 70 64 60 686 759 3303 514 278 321 234 615 750 3320 0 429 0 109 607 126 20485 327 480 88 202 579 321 3216 794 577 165 208 563 691 22773 0 8 0 456 500 39 In [40]: df_to.sort_values(by='export_err', ascending=False).head(20) Out[40]: import_ok export_ok import_skip export_skip import_err export_err to 6939 0 1615 0 4055 9524 4552 57463 2 103 1 1589 4484 2842 34224 34 163 15 333 2146 1754 174 0 1768 0 3686 6381 1682 1239 0 71 1891 391 0 1464 18106 4 70 0 524 1936 1393 3356 341 1179 718 4605 5642 1232 12552 362 294 0 340 1255 1105 1299 1928 908 1202 1006 242 854 20764 199 162 76 166 870 796 31133 481 448 77 240 887 760 8492 193 70 64 60 686 759 3303 514 278 321 234 615 750 3257 1381 489 423 899 843 747 3216 794 577 165 208 563 691 6461 0 261 2386 1646 0 615 37100 7 105 0 396 979 555 9002 1252 790 518 331 69 505 2914 29 400 1 845 1486 479 12389 939 672 145 136 174 421 ```

Many ASes have "perfect" 50% import_err or export_err rate. Filtered by that, the ASes with the most errors.

```python In [66]: df_from[df_from['%import_err'] == 50.0][columns_to_sum].sort_values(by='import_err', ascending=False).head(20) Out[66]: import_ok export_ok import_skip export_skip import_err export_err from 36902 0 0 0 11 11 0 394749 0 0 0 11 11 0 12680 0 0 0 0 9 9 51865 0 3 0 0 9 6 20830 0 0 0 0 9 9 20783 0 2 0 0 8 6 203380 0 0 0 0 8 8 17922 0 0 0 8 8 0 42965 0 1 0 0 8 7 51326 0 3 0 0 8 5 15894 0 3 0 0 8 5 62023 0 3 0 0 8 5 211945 0 1 0 0 7 6 58075 0 0 0 0 7 7 49463 0 3 0 0 7 4 36493 0 0 0 0 7 7 30925 0 0 0 0 7 7 56354 0 2 0 0 7 5 51273 0 3 0 0 7 4 35487 0 0 0 7 7 0 In [67]: df_from[df_from['%export_err'] == 50.0][columns_to_sum].sort_values(by='export_err', ascending=False).head(20) Out[67]: import_ok export_ok import_skip export_skip import_err export_err from 21433 1 0 23 0 43 67 63293 2 0 30 0 22 54 16509 2 0 26 0 22 50 45102 1 0 10 0 35 46 396986 0 0 10 0 31 41 23154 0 0 20 0 19 39 15169 1 0 22 0 14 37 54119 0 0 24 0 12 36 12222 2 0 16 0 15 33 2635 0 0 8 0 22 30 40934 0 0 11 0 16 27 16552 0 0 11 0 16 27 45899 0 0 11 0 15 26 206804 0 0 6 0 20 26 35928 0 0 12 0 11 23 45352 1 0 9 0 12 22 41378 2 0 9 0 11 22 13414 0 0 9 0 12 21 38193 1 0 4 0 16 21 62597 0 0 7 0 13 20 In [68]: df_to[df_to['%import_err'] == 50.0][columns_to_sum].sort_values(by='import_err', ascending=False).head(20) Out[68]: import_ok export_ok import_skip export_skip import_err export_err to 201053 0 4 0 3 210 203 5650 0 2 0 145 185 38 9730 0 0 0 146 159 13 9583 0 2 0 131 156 23 11664 0 0 0 112 115 3 4837 0 2 0 62 85 21 137491 0 0 0 75 82 7 55824 0 0 0 72 75 3 17762 0 1 0 69 74 4 45117 0 0 0 70 72 2 13767 0 1 0 60 68 7 19108 0 3 0 58 68 7 4764 0 1 0 56 65 8 17995 0 6 0 5 63 52 19009 0 2 0 43 62 17 12085 0 1 0 48 60 11 17665 0 0 0 57 59 2 135607 0 1 0 44 58 13 600 0 8 0 31 54 15 6181 0 2 0 35 52 15 In [69]: df_to[df_to['%export_err'] == 50.0][columns_to_sum].sort_values(by='export_err', ascending=False).head(20) Out[69]: import_ok export_ok import_skip export_skip import_err export_err to 64302 1 0 0 0 26 27 141137 0 0 0 0 14 14 141898 0 0 0 0 12 12 210104 0 0 0 0 12 12 211526 0 0 0 0 9 9 7717 1 0 0 0 8 9 24521 0 0 0 0 8 8 24538 0 0 0 0 8 8 20021 0 0 7 0 0 7 51648 0 0 0 0 6 6 147137 0 0 0 0 6 6 64313 0 0 0 0 6 6 23962 0 0 0 0 6 6 141626 0 0 0 0 6 6 3316 0 0 0 0 6 6 38732 0 0 0 0 6 6 47734 0 0 6 0 0 6 38524 0 0 0 0 6 6 47273 2 0 0 0 3 5 138884 0 0 0 0 5 5 ```

SichangHe commented 1 year ago

Moving to #24.

SichangHe / internet_route_verification

Data analysis for route verification reports #21

Stats per AS

Stats on up/down-hill

AS pair stats

Analysis on AS pair stats