Demo using Collaborative Launch feature at 1702994400 (Kartograf 0.4.1)

fjahr commented 10 months ago

In Kartograf 0.3.1 a new feature was added that allows users to start the mapping process at the exact same time at a previously agreed timestamp. This has shown some promise to result in the same or at least very similar results for the final mapping file across several participants. If multiple participants get the same result independently, trust in the ASMap file is further minimized as the input data generation is effectively federated. From the point of the creation of the input data, the process of arriving at the final ASMap file is reproducible with open-source tools and thus completely trustless.

As a demo of how this could work in the future, I propose the following timestamp to start a mapping process collaboratively: 1702994400. This is Tuesday, December 19, 2023 2:00:00 PM GMT, the same time of day at which the Bitcoin Core IRC meeting happens on Thursdays.

What you need to do to participate:

If you haven't used it already, get the latest version of Kartograf, install instruction are in the README. To ensure everything is set up correctly, maybe run ./run map once before setting the collaborative launch. If you have used Kartograf in the past you will probably still need to upgrade to the latest version, 0.3.1 at least.
When everything is set up you just need to run ./run map -w=1702994400 -irr -rv. The program runs and waits until the timestamp is hit, you just need to ensure that your computer is running until then and let it finish the process.
When the process is finished you will see a hash of the final result printed to the screen, please post it here as a comment.

A few notes, especially if you have tried Kartograf previously:

Runtime: Kartograf now uses multithreading, which means runtime can now be dramatically shorter than previously but it depends on the machine a lot and during certain periods you will see a lot of CPU usage. For reference, on a 2-year-old Macbook Pro with basic specs I expect to see ~3h for everything.
Network: Kartograf has been restructured to download everything it needs as fast as possible in the beginning since this also helps the chances of getting matching final results among participants. You should see a big spike in network activity in the beginning for a few minutes and then the tools runs basically fully offline. Since this is quite a bit of data and finishing fast means higher chance of success, I wouldn't recommend tethering over your mobile maybe. On the other hand, we also want to stress test this feature so if you only have a slow connection available, please still participate if you can!
Data: This is a full run collecting RPKI, IRR and Collectors data (from Routeviews). In previous demos didn't do this because of the long runtime.

Let me know if there are any questions. If you plan to participate, happy to see you confirm beforehand here as well.

fjahr commented 10 months ago

I think you meant 2pm UTC and 3pm CET, which is also what the timestamp says.

Of course, fixed, timezones.... 💀

Sjors commented 10 months ago

Hash: 28039075f34ef65f9faff36a5141ce35a9b322600a263d4cf4cbb5d3e45390fe

Output for 1703167200...

``` $ ./run map -w=1703167200 -irr -rv --- Start Kartograf --- Using rpki-client version 8.5. Coordinated launch mode: Waiting until 1703167200 (2023-12-21 15:00:00 CET) to launch mapping process. The epoch for this run is: 1703167200 (2023-12-21 14:00:00 UTC, local: 2023-12-21 15:00:00 CET) --- Fetching RPKI --- Downloaded TAL for AFRINIC to /home/sjors/dev/kartograf/data/1703167200/rpki/tals/afrinic.tal Downloaded TAL for APNIC to /home/sjors/dev/kartograf/data/1703167200/rpki/tals/apnic.tal Downloaded TAL for ARIN to /home/sjors/dev/kartograf/data/1703167200/rpki/tals/arin.tal Downloaded TAL for LACNIC to /home/sjors/dev/kartograf/data/1703167200/rpki/tals/lacnic.tal Downloaded TAL for RIPE to /home/sjors/dev/kartograf/data/1703167200/rpki/tals/ripe.tal Downloading RPKI Data ...finished in 0:01:30.949886 --- Fetching IRR --- Downloading afrinic.db.gz Downloading apnic.db.route.gz Downloading apnic.db.route6.gz Downloading arin.db.gz Downloading lacnic.db.gz Downloading ripe.db.route.gz Downloading ripe.db.route6.gz ...finished in 0:00:34.156138 --- Fetching Routeviews pfx2as --- Downloading https://publicdata.caida.org/datasets/routing/routeviews-prefix2as/2023/12/routeviews-rv2-20231219-1200.pfx2as.gz Downloading https://publicdata.caida.org/datasets/routing/routeviews6-prefix2as/2023/12/routeviews-rv6-20231220-1200.pfx2as.gz ...finished in 0:00:08.229348 --- Validating RPKI --- Validating RPKI ROAs 209048 raw RKPI ROA files found. 209048 RKPI ROAs validated and saved to /home/sjors/dev/kartograf/out/1703167200/rpki/rpki_raw.json ...finished in 0:04:21.252100 --- Parsing RPKI --- Parsing 209048 ROAs Result entries written: 450887 Duplicates found: 55917 Invalids found: 20535 Incompletes: 0 Non-ROA files: 0 ...finished in 0:00:46.141225 --- Parsing IRR --- Extracting afrinic.db.gz Extracting apnic.db.route.gz Extracting apnic.db.route6.gz Extracting arin.db.gz Extracting lacnic.db.gz Extracting ripe.db.route.gz Extracting ripe.db.route6.gz Parsing /home/sjors/dev/kartograf/out/1703167200/irr/arin.db Found valid entries: 102296 Parsing /home/sjors/dev/kartograf/out/1703167200/irr/afrinic.db Found valid entries: 112325 Parsing /home/sjors/dev/kartograf/out/1703167200/irr/lacnic.db Found valid entries: 12224 Parsing /home/sjors/dev/kartograf/out/1703167200/irr/ripe.db.route Found valid entries: 405374 Parsing /home/sjors/dev/kartograf/out/1703167200/irr/apnic.db.route6 Found valid entries: 329103 Parsing /home/sjors/dev/kartograf/out/1703167200/irr/ripe.db.route6 Found valid entries: 197126 Parsing /home/sjors/dev/kartograf/out/1703167200/irr/apnic.db.route Found valid entries: 693049 ...finished in 0:02:50.698597 --- Merging RPKI and IRR data --- Parse base file to numpy arrays Parse extra file to Pandas DataFrame Filtering extra prefixes that were already included in the base file: 100.00% :::::::::::::::::::::::::::::::::::::::: | 115714 / 115714 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 115713 / 115713 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 115713 / 115713 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 115713 / 115713 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 115713 / 115713 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 115713 / 115713 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 115713 / 115713 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 115713 / 115713 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 115713 / 115713 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 115713 / 115713 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 115713 / 115713 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 115713 / 115713 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 115713 / 115713 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 115713 / 115713 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 115713 / 115713 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 115713 / 115713 | Finished filtering! Originally 1851409 entries filtered down to 523115 Merging base file with filtered extra file ...finished in 1:11:33.242976 --- Parsing Routeviews pfx2as --- Unzipping /home/sjors/dev/kartograf/data/1703167200/collectors/routeviews_pfx2asn_ip4.txt.gz Formatting /home/sjors/dev/kartograf/out/1703167200/collectors/routeviews_pfx2asn_ip4.txt Unzipping /home/sjors/dev/kartograf/data/1703167200/collectors/routeviews_pfx2asn_ip6.txt.gz Formatting /home/sjors/dev/kartograf/out/1703167200/collectors/routeviews_pfx2asn_ip6.txt Cleaning /home/sjors/dev/kartograf/out/1703167200/collectors/pfx2asn.txt ...finished in 0:01:41.887934 --- Merging Routeviews and base data --- Parse base file to numpy arrays Parse extra file to Pandas DataFrame Filtering extra prefixes that were already included in the base file: 100.00% :::::::::::::::::::::::::::::::::::::::: | 74368 / 74368 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 74368 / 74368 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 74368 / 74368 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 74368 / 74368 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 74368 / 74368 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 74368 / 74368 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 74368 / 74368 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 74368 / 74368 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 74368 / 74368 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 74368 / 74368 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 74368 / 74368 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 74368 / 74368 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 74367 / 74367 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 74367 / 74367 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 74367 / 74367 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 74367 / 74367 | Finished filtering! Originally 1189884 entries filtered down to 363874 Merging base file with filtered extra file ...finished in 1:45:18.667636 --- Sorting results --- ...finished in 0:00:07.043800 --- Finishing Kartograf --- The SHA-256 hash of the result file is: 28039075f34ef65f9faff36a5141ce35a9b322600a263d4cf4cbb5d3e45390fe Total runtime: 3:08:53.706992 ```

In case you need it: final_result.txt.gz (4.9 MB)

0xB10C commented 10 months ago

It's a match! :tada:

The SHA-256 hash of the result file is: 28039075f34ef65f9faff36a5141ce35a9b322600a263d4cf4cbb5d3e45390fe
Total runtime: 6:03:28.581414

``` --- Start Kartograf --- Using rpki-client version 8.6. Coordinated launch mode: Waiting until 1703167200 (2023-12-21 15:00:00 CET) to launch mapping process. The epoch for this run is: 1703167200 (2023-12-21 14:00:00 UTC, local: 2023-12-21 15:00:00 CET) --- Fetching RPKI --- Downloaded TAL for AFRINIC to /home/b10c/kartograf/data/1703167200/rpki/tals/afrinic.tal Downloaded TAL for APNIC to /home/b10c/kartograf/data/1703167200/rpki/tals/apnic.tal Downloaded TAL for ARIN to /home/b10c/kartograf/data/1703167200/rpki/tals/arin.tal Downloaded TAL for LACNIC to /home/b10c/kartograf/data/1703167200/rpki/tals/lacnic.tal Downloaded TAL for RIPE to /home/b10c/kartograf/data/1703167200/rpki/tals/ripe.tal Downloading RPKI Data ...finished in 0:03:01.053927 --- Fetching IRR --- Downloading afrinic.db.gz Downloading apnic.db.route.gz Downloading apnic.db.route6.gz Downloading arin.db.gz Downloading lacnic.db.gz Downloading ripe.db.route.gz Downloading ripe.db.route6.gz ...finished in 0:00:37.073114 --- Fetching Routeviews pfx2as --- Downloading https://publicdata.caida.org/datasets/routing/routeviews-prefix2as/2023/12/routeviews-rv2-20231219-1200.pfx2as.gz Downloading https://publicdata.caida.org/datasets/routing/routeviews6-prefix2as/2023/12/routeviews-rv6-20231220-1200.pfx2as.gz ...finished in 0:00:06.714772 --- Validating RPKI --- Validating RPKI ROAs 209047 raw RKPI ROA files found. 209047 RKPI ROAs validated and saved to /home/b10c/kartograf/out/1703167200/rpki/rpki_raw.json ...finished in 0:17:17.184170 --- Parsing RPKI --- Parsing 209047 ROAs Result entries written: 450886 Duplicates found: 55917 Invalids found: 20535 Incompletes: 0 Non-ROA files: 0 ...finished in 0:01:26.158277 --- Parsing IRR --- Extracting afrinic.db.gz Extracting apnic.db.route.gz Extracting apnic.db.route6.gz Extracting arin.db.gz Extracting lacnic.db.gz Extracting ripe.db.route.gz Extracting ripe.db.route6.gz Parsing /home/b10c/kartograf/out/1703167200/irr/afrinic.db Found valid entries: 112325 Parsing /home/b10c/kartograf/out/1703167200/irr/apnic.db.route Found valid entries: 693049 Parsing /home/b10c/kartograf/out/1703167200/irr/apnic.db.route6 Found valid entries: 329103 Parsing /home/b10c/kartograf/out/1703167200/irr/arin.db Found valid entries: 102296 Parsing /home/b10c/kartograf/out/1703167200/irr/lacnic.db Found valid entries: 12224 Parsing /home/b10c/kartograf/out/1703167200/irr/ripe.db.route Found valid entries: 405374 Parsing /home/b10c/kartograf/out/1703167200/irr/ripe.db.route6 Found valid entries: 197126 ...finished in 0:05:26.380400 --- Merging RPKI and IRR data --- Parse base file to numpy arrays Parse extra file to Pandas DataFrame Filtering extra prefixes that were already included in the base file: 100.00% :::::::::::::::::::::::::::::::::::::::: | 308569 / 308569 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 308568 / 308568 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 308568 / 308568 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 308568 / 308568 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 308568 / 308568 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 308568 / 308568 | Finished filtering! Originally 1851409 entries filtered down to 523116 Merging base file with filtered extra file ...finished in 2:12:40.567283 --- Parsing Routeviews pfx2as --- Unzipping /home/b10c/kartograf/data/1703167200/collectors/routeviews_pfx2asn_ip4.txt.gz Formatting /home/b10c/kartograf/out/1703167200/collectors/routeviews_pfx2asn_ip4.txt Unzipping /home/b10c/kartograf/data/1703167200/collectors/routeviews_pfx2asn_ip6.txt.gz Formatting /home/b10c/kartograf/out/1703167200/collectors/routeviews_pfx2asn_ip6.txt Cleaning /home/b10c/kartograf/out/1703167200/collectors/pfx2asn.txt ...finished in 0:03:10.729591 --- Merging Routeviews and base data --- Parse base file to numpy arrays Parse extra file to Pandas DataFrame Filtering extra prefixes that were already included in the base file: 100.00% :::::::::::::::::::::::::::::::::::::::: | 198314 / 198314 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 198314 / 198314 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 198314 / 198314 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 198314 / 198314 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 198314 / 198314 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 198314 / 198314 | Finished filtering! Originally 1189884 entries filtered down to 363874 Merging base file with filtered extra file ...finished in 3:19:28.471326 --- Sorting results --- ...finished in 0:00:12.626709 --- Finishing Kartograf --- The SHA-256 hash of the result file is: 28039075f34ef65f9faff36a5141ce35a9b322600a263d4cf4cbb5d3e45390fe Total runtime: 6:03:28.581414 ```

jurraca commented 10 months ago

Matched as well :triumph:

The SHA-256 hash of the result file is: 28039075f34ef65f9faff36a5141ce35a9b322600a263d4cf4cbb5d3e45390fe
Total runtime: 9:34:04.141037

--- Start Kartograf --- Using rpki-client version 8.6. Coordinated launch mode: Waiting until 1703167200 (2023-12-21 14:00:00 WET) to launch mapping process. The epoch for this run is: 1703167200 (2023-12-21 14:00:00 UTC, local: 2023-12-21 14:00:00 WET) --- Fetching RPKI --- Downloaded TAL for AFRINIC to /home/base/code/kartograf/data/1703167200/rpki/tals/afrinic.tal Downloaded TAL for APNIC to /home/base/code/kartograf/data/1703167200/rpki/tals/apnic.tal Downloaded TAL for ARIN to /home/base/code/kartograf/data/1703167200/rpki/tals/arin.tal Downloaded TAL for LACNIC to /home/base/code/kartograf/data/1703167200/rpki/tals/lacnic.tal Downloaded TAL for RIPE to /home/base/code/kartograf/data/1703167200/rpki/tals/ripe.tal Downloading RPKI Data ...finished in 0:03:20.407919 --- Fetching IRR --- Downloading afrinic.db.gz Downloading apnic.db.route.gz Downloading apnic.db.route6.gz Downloading arin.db.gz Downloading lacnic.db.gz Downloading ripe.db.route.gz Downloading ripe.db.route6.gz ...finished in 0:01:19.785957 --- Fetching Routeviews pfx2as --- Downloading https://publicdata.caida.org/datasets/routing/routeviews-prefix2as/2023/12/routeviews-rv2-20231219-1200.pfx2as.gz Downloading https://publicdata.caida.org/datasets/routing/routeviews-prefix2as/2023/12/routeviews-rv2-20231219-1200.pfx2as.gz Downloading https://publicdata.caida.org/datasets/routing/routeviews6-prefix2as/2023/12/routeviews-rv6-20231220-1200.pfx2as.gz ...finished in 0:00:12.090912 --- Validating RPKI --- Validating RPKI ROAs 209048 raw RKPI ROA files found. 209048 RKPI ROAs validated and saved to /home/base/code/kartograf/out/1703167200/rpki/rpki_raw.json ...finished in 0:23:37.554676 --- Parsing RPKI --- Parsing 209048 ROAs Result entries written: 450887 Duplicates found: 55917 Invalids found: 20535 Incompletes: 0 Non-ROA files: 0 ...finished in 0:01:37.959571 --- Parsing IRR --- Extracting afrinic.db.gz Extracting apnic.db.route.gz Extracting apnic.db.route6.gz Extracting arin.db.gz Extracting lacnic.db.gz Extracting ripe.db.route.gz Extracting ripe.db.route6.gz Parsing /home/base/code/kartograf/out/1703167200/irr/afrinic.db Found valid entries: 112325 Parsing /home/base/code/kartograf/out/1703167200/irr/arin.db Found valid entries: 102296 Parsing /home/base/code/kartograf/out/1703167200/irr/ripe.db.route6 Found valid entries: 197126 Parsing /home/base/code/kartograf/out/1703167200/irr/ripe.db.route6 [35/1947] Found valid entries: 197126 Parsing /home/base/code/kartograf/out/1703167200/irr/apnic.db.route6 Found valid entries: 329103 Parsing /home/base/code/kartograf/out/1703167200/irr/lacnic.db Found valid entries: 12224 Parsing /home/base/code/kartograf/out/1703167200/irr/ripe.db.route Found valid entries: 405374 Parsing /home/base/code/kartograf/out/1703167200/irr/apnic.db.route Found valid entries: 693049 ...finished in 0:06:14.558009 --- Merging RPKI and IRR data --- Parse base file to numpy arrays Parse extra file to Pandas DataFrame Filtering extra prefixes that were already included in the base file: 0.00% | 0 / 231427 | 0.00% | 0 / 231427 | 93 93.27% ::::::::::::::::::::::::::::::::::::: | 215846 / 231427 | 93 93.27% ::::::::::::::::::::::::::::::::::::: | 215852 / 231427 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 231427 / 231427 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 231426 / 231426 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 231426 / 231426 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 231426 / 231426 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 231426 / 231426 | ^ 100.00% :::::::::::::::::::::::::::::::::::::::: | 231426 / 231426 | ^ 100.00% :::::::::::::::::::::::::::::::::::::::: | 231426 / 231426 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 231426 / 231426 | Finished filtering! Originally 1851409 entries filtered down to 523115 Merging base file with filtered extra file ...finished in 3:40:25.841542 --- Parsing Routeviews pfx2as --- Unzipping /home/base/code/kartograf/data/1703167200/collectors/routeviews_pfx2asn_ip4.txt.gz Unzipping /home/base/code/kartograf/data/1703167200/collectors/routeviews_pfx2asn_ip4.txt.gz Formatting /home/base/code/kartograf/out/1703167200/collectors/routeviews_pfx2asn_ip4.txt Unzipping /home/base/code/kartograf/data/1703167200/collectors/routeviews_pfx2asn_ip6.txt.gz Formatting /home/base/code/kartograf/out/1703167200/collectors/routeviews_pfx2asn_ip6.txt Cleaning /home/base/code/kartograf/out/1703167200/collectors/pfx2asn.txt ...finished in 0:03:56.049441 --- Merging Routeviews and base data --- Parse base file to numpy arrays Parse extra file to Pandas DataFrame Filtering extra prefixes that were already included in the base file: 100.00% :::::::::::::::::::::::::::::::::::::::: | 148736 / 148736 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 148736 / 148736 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 148736 / 148736 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 148736 / 148736 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 148735 / 148735 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 148735 / 148735 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 148735 / 148735 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 148735 / 148735 | Finished filtering! Originally 1189884 entries filtered down to 363874 Merging base file with filtered extra file ...finished in 5:13:03.231756 --- Sorting results --- ...finished in 0:00:14.493410 --- Finishing Kartograf --- The SHA-256 hash of the result file is: 28039075f34ef65f9faff36a5141ce35a9b322600a263d4cf4cbb5d3e45390fe Total runtime: 9:34:04.141037

fjahr commented 10 months ago

Not a match for me but very happy the fix for the previous bug now works and we are getting matches! This looks already much better than I expect a few weeks ago when I had the idea for this feature :)

My hash is b118079898b62383f13a81269367e45adc24a015e5f8a83799fc14517411c637 FWIW.

My logs

``` $ ./run map -w=1703167200 -irr -rv --- Start Kartograf --- Using rpki-client version 8.5. Coordinated launch mode: Waiting until 1703167200 (2023-12-21 15:00:00 CET) to launch mapping process. The epoch for this run is: 1703167200 (2023-12-21 14:00:00 UTC, local: 2023-12-21 15:00:00 CET) --- Fetching RPKI --- Downloaded TAL for AFRINIC to /Users/FJ/projects/python/kartograf/data/1703167200/rpki/tals/afrinic.tal Downloaded TAL for APNIC to /Users/FJ/projects/python/kartograf/data/1703167200/rpki/tals/apnic.tal Downloaded TAL for ARIN to /Users/FJ/projects/python/kartograf/data/1703167200/rpki/tals/arin.tal Downloaded TAL for LACNIC to /Users/FJ/projects/python/kartograf/data/1703167200/rpki/tals/lacnic.tal Downloaded TAL for RIPE to /Users/FJ/projects/python/kartograf/data/1703167200/rpki/tals/ripe.tal Downloading RPKI Data ...finished in 0:08:13.241378 --- Fetching IRR --- Downloading afrinic.db.gz Downloading apnic.db.route.gz Downloading apnic.db.route6.gz Downloading arin.db.gz Downloading lacnic.db.gz Downloading ripe.db.route.gz Downloading ripe.db.route6.gz ...finished in 0:00:56.004794 --- Fetching Routeviews pfx2as --- Downloading https://publicdata.caida.org/datasets/routing/routeviews-prefix2as/2023/12/routeviews-rv2-20231219-1200.pfx2as.gz Downloading https://publicdata.caida.org/datasets/routing/routeviews6-prefix2as/2023/12/routeviews-rv6-20231220-1200.pfx2as.gz ...finished in 0:00:05.761229 --- Validating RPKI --- Validating RPKI ROAs 209048 raw RKPI ROA files found. 209048 RKPI ROAs validated and saved to /Users/FJ/projects/python/kartograf/out/1703167200/rpki/rpki_raw.json ...finished in 0:36:42.780071 --- Parsing RPKI --- Parsing 209048 ROAs Result entries written: 450877 Duplicates found: 55916 Invalids found: 20546 Incompletes: 0 Non-ROA files: 0 ...finished in 0:00:51.757210 --- Parsing IRR --- Extracting afrinic.db.gz Extracting apnic.db.route.gz Extracting apnic.db.route6.gz Extracting arin.db.gz Extracting lacnic.db.gz Extracting ripe.db.route.gz Extracting ripe.db.route6.gz Parsing /Users/FJ/projects/python/kartograf/out/1703167200/irr/lacnic.db Found valid entries: 12224 Parsing /Users/FJ/projects/python/kartograf/out/1703167200/irr/ripe.db.route Found valid entries: 405374 Parsing /Users/FJ/projects/python/kartograf/out/1703167200/irr/apnic.db.route6 Found valid entries: 329103 Parsing /Users/FJ/projects/python/kartograf/out/1703167200/irr/apnic.db.route Found valid entries: 693049 Parsing /Users/FJ/projects/python/kartograf/out/1703167200/irr/arin.db Found valid entries: 102296 Parsing /Users/FJ/projects/python/kartograf/out/1703167200/irr/afrinic.db Found valid entries: 112325 Parsing /Users/FJ/projects/python/kartograf/out/1703167200/irr/ripe.db.route6 Found valid entries: 197126 ...finished in 0:03:15.496587 --- Merging RPKI and IRR data --- Parse base file to numpy arrays Parse extra file to Pandas DataFrame Filtering extra prefixes that were already included in the base file: 100.00% :::::::::::::::::::::::::::::::::::::::: | 185141 / 185141 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 185141 / 185141 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 185141 / 185141 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 185141 / 185141 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 185141 / 185141 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 185141 / 185141 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 185141 / 185141 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 185141 / 185141 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 185141 / 185141 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 185140 / 185140 | Finished filtering! Originally 1851409 entries filtered down to 523146 Merging base file with filtered extra file ...finished in 1:07:13.947766 --- Parsing Routeviews pfx2as --- Unzipping /Users/FJ/projects/python/kartograf/data/1703167200/collectors/routeviews_pfx2asn_ip4.txt.gz Formatting /Users/FJ/projects/python/kartograf/out/1703167200/collectors/routeviews_pfx2asn_ip4.txt Unzipping /Users/FJ/projects/python/kartograf/data/1703167200/collectors/routeviews_pfx2asn_ip6.txt.gz Formatting /Users/FJ/projects/python/kartograf/out/1703167200/collectors/routeviews_pfx2asn_ip6.txt Cleaning /Users/FJ/projects/python/kartograf/out/1703167200/collectors/pfx2asn.txt ...finished in 0:01:57.510614 --- Merging Routeviews and base data --- Parse base file to numpy arrays Parse extra file to Pandas DataFrame Filtering extra prefixes that were already included in the base file: 100.00% :::::::::::::::::::::::::::::::::::::::: | 118989 / 118989 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 118989 / 118989 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 118989 / 118989 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 118989 / 118989 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 118988 / 118988 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 118988 / 118988 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 118988 / 118988 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 118988 / 118988 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 118988 / 118988 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 118988 / 118988 | Finished filtering! Originally 1189884 entries filtered down to 363875 Merging base file with filtered extra file ...finished in 1:58:06.471503 --- Sorting results --- ...finished in 0:00:07.708851 --- Finishing Kartograf --- The SHA-256 hash of the result file is: b118079898b62383f13a81269367e45adc24a015e5f8a83799fc14517411c637 Total runtime: 6:03:34.475947 ```

So, comparing the logs it looks to me like there is probably a slight different from the RPKI data. Comparing to @Sjors I found one duplicate less and 11 invalids more. That's probably where diff is coming from but I expect it to be small. I'm sharing my result here if someone wants do compare but I would also like to receive the result from one of you so I can look into it.

final_result.txt.zip

Additionally, I would be happy if someone can share their out/rpki/rpki_raw.json, there we can see what caused the actual validation differences. It's a pretty big file so it probably can't be shared here directly even after compression.

Sjors commented 10 months ago

@fjahr WeTransfer: https://we.tl/t-iqJn05yzgb (34 MB)

0xB10C commented 10 months ago

I've noticed that the final_results.txt file contains AS37963, but also aS30844, and As51528, and as200890. Is the final_results.txt file relevant for next steps in the asmap process? Does it make sense to standardize this to uppercase (e.g., always "AS1234")?

jurraca commented 10 months ago

Here's my final_result.zip @fjahr . (q: how are you uploading zip files to this repo directly?) (edited upload)

0xB10C commented 10 months ago

(q: how are you uploading zip files to this repo directly?)

You can just drag-and-drop it into the comment field here or use the paper clip icon on the top right.

fjahr commented 10 months ago

I've noticed that the final_results.txt file contains AS37963, but also aS30844, and As51528, and as200890. Is the final_results.txt file relevant for next steps in the asmap process? Does it make sense to standardize this to uppercase (e.g., always "AS1234")?

Oh, very interesting. I didn't notice that yet, whenever I write it explicitly in kartograf I do 'AS' but I will look into it. In the back of my mind I always had the idea that writing AS explicitly was redundant anyway, so when should just drop it and save those bytes. I will test if that even breaks Sipa's compression code and if not, I will simply do that instead :)

Sjors commented 10 months ago

Keeping the AS prefix makes it marginally more human readable and easier to copy-paste into google. These extra bytes don't make into the final bitcoin binary I assume, so it's probably fine to just keep them.

0xB10C commented 10 months ago

Looking through the file, I found it interesting that there are duplicate assignments for IP-net to ASN: e.g., 126 entries for 2001:503:eea3::30/128 with different ASNs. Not sure if that's to be expected. I probably need to look into the whole asmap process a bit more.

FWIW here's the diff -patch final_result_0xb10c.txt final_result_fjahr.txt between fjahr's and my file. No difference between sjors and my file - I guess that's to be expected when we get the same hash.

fjahr's results have (multiple) ASNs for 45.132.[188-191].0/24 which I'm missing
147.28.8.0/23 AS58367 and 147.28.8.0/24 AS58367 147.28.9.0/24 AS58367 are the same result, just differently written. A .. 8.0/23 net is the same as a .. 8.0/24 + a .. 9.0/24 net. Not sure how to solve this? Maybe always trying to build the biggest supernet for an ASN? This might also reduce the file size a bit?

*** final_result_0xb10c.txt 2023-12-21 22:15:19 +0100
--- final_result_fjahr.txt  2023-12-21 21:03:35 +0100
***************
*** 113800,113806 ****
--- 113800,113827 ----
  45.132.185.0/24 AS35830
  45.132.186.0/24 AS35830
  45.132.187.0/24 AS35830
+ 45.132.188.0/24 AS3970
+ 45.132.188.0/24 AS47065
+ 45.132.188.0/24 AS61574
+ 45.132.188.0/24 AS61575
+ 45.132.188.0/24 AS61576
+ 45.132.188.0/22 AS3130
  45.132.188.0/22 AS3970
+ 45.132.189.0/24 AS3970
+ 45.132.189.0/24 AS47065
+ 45.132.189.0/24 AS61574
+ 45.132.189.0/24 AS61575
+ 45.132.189.0/24 AS61576
+ 45.132.190.0/24 AS3970
+ 45.132.190.0/24 AS47065
+ 45.132.190.0/24 AS61574
+ 45.132.190.0/24 AS61575
+ 45.132.190.0/24 AS61576
+ 45.132.191.0/24 AS3970
+ 45.132.191.0/24 AS47065
+ 45.132.191.0/24 AS61574
+ 45.132.191.0/24 AS61575
+ 45.132.191.0/24 AS61576
  45.132.192.0/24 AS210636
  45.132.193.0/24 AS39351
  45.132.194.0/24 AS61272
***************
*** 560369,560375 ****
  147.28.6.0/24 AS61575
  147.28.6.0/24 AS61576
  147.28.7.0/24 AS3130
! 147.28.8.0/23 AS58367
  147.28.10.0/24 AS47065
  147.28.10.0/24 AS61574
  147.28.10.0/24 AS61575
--- 560390,560397 ----
  147.28.6.0/24 AS61575
  147.28.6.0/24 AS61576
  147.28.7.0/24 AS3130
! 147.28.8.0/24 AS58367
! 147.28.9.0/24 AS58367
  147.28.10.0/24 AS47065
  147.28.10.0/24 AS61574
  147.28.10.0/24 AS61575

Sjors commented 10 months ago

You could also have a two-round process. First round like we did above, where some people share more details to figure out the differences. Then in a second round, which skips the download step, everyone applies a patch to reconcile or get rid of the conflicting entries. And then re-run the steps.

fjahr commented 10 months ago

Keeping the AS prefix makes it marginally more human readable and easier to copy-paste into google. These extra bytes don't make into the final bitcoin binary I assume, so it's probably fine to just keep them.

I am keeping the AS in but explicitly format it to be all-caps now every time I write it.

Maybe always trying to build the biggest supernet for an ASN? This might also reduce the file size a bit?

I am currently not in favor of this since it would make debugging/analyzing the result harder and in terms of the file size I think in the compressed state this doesn't have much impact, if it's not resolved completely. It would potentially also mess with the coverage statistics of the network, depending on how it's implemented.

I found it interesting that there are duplicate assignments for IP-net to ASN

These shouldn't have been there. I found they are coming from IRR and from my understanding, this isn't normal usage. I checked some entries and it seems more like these are there to not block some experiments. Either way, I have resolved this in the same way I am doing it in PRKI, where duplicates are very common.

Example:

route:          45.132.188.0/24
descr:          RPKI Experiment
origin:         AS61575
notify:         rw@rg.net
mnt-by:         MAINT-RGNET
created:        2020-11-28T19:56:21Z
last-modified:  2020-11-28T19:56:21Z
source:         RIPE

I also found another issue where a handful of IPv6 prefixes had leading zeros, i.e. 2401:0001::/32. These only disappeared later in the process when the IP is converted to an int and so this influenced results.

All the issues are resolved in master and I will do some more testing before we can to another test. Thanks a lot everyone again for joining and testing!

fjahr commented 10 months ago

You could also have a two-round process. First round like we did above, where some people share more details to figure out the differences. Then in a second round, which skips the download step, everyone applies a patch to reconcile or get rid of the conflicting entries. And then re-run the steps.

I think this is a bit too complicated for starting out. I think the regular process should be very quick by default so that we can be sure it isn't blocked. It seems pretty promising that we can do a collaborative launch with a group and if the majority has the same result, it's good to merge. If we don't get a majority we simply try again.

Figuring out the differences may be easy sometimes but often I think this could create a lot of research work and/or the result may not be satisfactory for everyone. We would need some kind of definition for what is ok and not ok and then come to a rough consensus agreement on which way to go for each line of the diff. I am a bit afraid we may not have everyone's attention all of the time and so if a diff analysis gets drawn out do we still have enough of the original runners engaged when it's resolved? Or would it be ok if someone else comes in who was not part of the original group but then reviews the diff and says that it is alright? I see a lot of unanswered questions so I would like us to learn a bit more along the way before we commit to figuring out the diffs every time. Of course, it's always encouraged to look at the diffs and figure out where they came from and what they mean, if there is something to be concerned or a sign for improvement. I just would like to bank the whole process on that part right now.

It also makes the process a bit more vulnerable to attacks, I think. If we try to reconcile all differences the process can be blocked by a single person submitting a bogus result that is impossible to resolve because it's not based on real data. With the "federated" approach above there would at least be a significant part of the group that submits such results and may be more obvious what is happening if 3-5 never seen before github nyms join the process rather than just one.

fjahr commented 10 months ago

Hi and happy new year! 🎆

I have tagged release 0.4.0 which has the fixes I mentioned previously and a few other nice usability improvements, like an added progress bar during RPKI validation, showing a countdown during the wait, printing the current Kartograf version etc.

I want to do (yet another) test run with this version. The previous run already was a success IMO since it showed that except for me everyone else agreed on the result and the differences between me and the rest were in the "explainable" category. But doing another run would be great because we can't test too much and the fixes I have introduced should lead to an easier time drilling down more into the diff.

I propose tomorrow 2:00 pm GMT (1704463200) and I hope it's not too short notice. Thanks again! 🙏 🙏 🙏

./run map -w=1704463200 -irr -rv

@brunoerg did you end up having a result hash for the previous run BTW?

brunoerg commented 10 months ago

@fjahr no, I had a problem on my laptop, fortunately I just bought a better one and will have the runs faster 🚀

fjahr commented 10 months ago

@fjahr no, I had a problem on my laptop, fortunately I just bought a better one and will have the runs faster 🚀

Ok, thanks for the heads-up!

fjahr commented 10 months ago

I propose tomorrow 2:00 pm GMT (1704463200) and I hope it's not too short notice. Thanks again! 🙏 🙏 🙏
./run map -w=1704463200 -irr -rv

Feel free to use 0.4.1 for this one instead, where I have improved the formatting of the countdown, something I forgot to include in the release previously: https://github.com/fjahr/kartograf/releases/tag/0.4.1

0xB10C commented 10 months ago

I propose tomorrow 2:00 pm GMT (1704463200) and I hope it's not too short notice. Thanks again! 🙏 🙏 🙏

Reminder, this is in 1h and 30 min. I'll be joining.

brunoerg commented 10 months ago

I'll be joining!

The countdown is great, nice job!

fjahr commented 9 months ago

My result hash is4aba172195b8dc375c2bb1b39c3508f48c212cedc02983a7dc794a373820653b

Detailed logs

``` $ ./run map -w=1704463200 -irr -rv --- Start Kartograf --- Kartograf version: 0.4.1 Using rpki-client version 8.5. Coordinated launch mode: Waiting until 1704463200 (2024-01-05 15:00:00 CET) to launch mapping process. Countdown: 0 second(s) Starting... The epoch for this run is: 1704463200 (2024-01-05 14:00:00 UTC, local: 2024-01-05 15:00:00 CET) --- Fetching RPKI --- Downloaded TAL for AFRINIC to /Users/FJ/projects/python/kartograf/data/1704463200/rpki/tals/afrinic.tal Downloaded TAL for APNIC to /Users/FJ/projects/python/kartograf/data/1704463200/rpki/tals/apnic.tal Downloaded TAL for ARIN to /Users/FJ/projects/python/kartograf/data/1704463200/rpki/tals/arin.tal Downloaded TAL for LACNIC to /Users/FJ/projects/python/kartograf/data/1704463200/rpki/tals/lacnic.tal Downloaded TAL for RIPE to /Users/FJ/projects/python/kartograf/data/1704463200/rpki/tals/ripe.tal Downloading RPKI Data ...finished in 0:08:59.532504 --- Fetching IRR --- Downloading afrinic.db.gz Downloading apnic.db.route.gz Downloading apnic.db.route6.gz Downloading arin.db.gz Downloading lacnic.db.gz Downloading ripe.db.route.gz Downloading ripe.db.route6.gz ...finished in 0:01:05.592947 --- Fetching Routeviews pfx2as --- Downloading https://publicdata.caida.org/datasets/routing/routeviews-prefix2as/2024/01/routeviews-rv2-20240103-1200.pfx2as.gz Downloading https://publicdata.caida.org/datasets/routing/routeviews6-prefix2as/2024/01/routeviews-rv6-20240104-1200.pfx2as.gz ...finished in 0:00:07.467450 --- Validating RPKI --- Validating RPKI ROAs 212003 raw RKPI ROA files found. 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 212003/212003 [36:11<00:00, 97.65it/s] 212003 RKPI ROAs validated and saved to /Users/FJ/projects/python/kartograf/out/1704463200/rpki/rpki_raw.json ...finished in 0:36:33.467654 --- Parsing RPKI --- Parsing 212003 ROAs Result entries written: 451864 Duplicates found: 56623 Invalids found: 23066 Incompletes: 0 Non-ROA files: 0 ...finished in 0:00:53.797292 --- Parsing IRR --- Extracting afrinic.db.gz Extracting apnic.db.route.gz Extracting apnic.db.route6.gz Extracting arin.db.gz Extracting lacnic.db.gz Extracting ripe.db.route.gz Extracting ripe.db.route6.gz Parsing /Users/FJ/projects/python/kartograf/out/1704463200/irr/lacnic.db Parsing /Users/FJ/projects/python/kartograf/out/1704463200/irr/ripe.db.route Parsing /Users/FJ/projects/python/kartograf/out/1704463200/irr/apnic.db.route6 Parsing /Users/FJ/projects/python/kartograf/out/1704463200/irr/apnic.db.route Parsing /Users/FJ/projects/python/kartograf/out/1704463200/irr/arin.db Parsing /Users/FJ/projects/python/kartograf/out/1704463200/irr/afrinic.db Parsing /Users/FJ/projects/python/kartograf/out/1704463200/irr/ripe.db.route6 Found valid, unique entries: 1419403 ...finished in 0:03:10.402381 --- Merging RPKI and IRR data --- Parse base file to numpy arrays Parse extra file to Pandas DataFrame Filtering extra prefixes that were already included in the base file: 100.00% :::::::::::::::::::::::::::::::::::::::: | 141941 / 141941 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 141941 / 141941 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 141941 / 141941 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 141940 / 141940 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 141940 / 141940 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 141940 / 141940 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 141940 / 141940 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 141940 / 141940 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 141940 / 141940 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 141940 / 141940 | Finished filtering! Originally 1419403 entries filtered down to 394036 Merging base file with filtered extra file ...finished in 0:59:16.045177 --- Parsing Routeviews pfx2as --- Unzipping /Users/FJ/projects/python/kartograf/data/1704463200/collectors/routeviews_pfx2asn_ip4.txt.gz Formatting /Users/FJ/projects/python/kartograf/out/1704463200/collectors/routeviews_pfx2asn_ip4.txt Unzipping /Users/FJ/projects/python/kartograf/data/1704463200/collectors/routeviews_pfx2asn_ip6.txt.gz Formatting /Users/FJ/projects/python/kartograf/out/1704463200/collectors/routeviews_pfx2asn_ip6.txt Cleaning /Users/FJ/projects/python/kartograf/out/1704463200/collectors/pfx2asn.txt ...finished in 0:02:00.596561 --- Merging Routeviews and base data --- Parse base file to numpy arrays Parse extra file to Pandas DataFrame Filtering extra prefixes that were already included in the base file: 100.00% :::::::::::::::::::::::::::::::::::::::: | 118929 / 118929 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 118929 / 118929 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 118929 / 118929 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 118929 / 118929 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 118929 / 118929 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 118929 / 118929 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 118929 / 118929 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 118929 / 118929 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 118929 / 118929 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 118928 / 118928 | Finished filtering! Originally 1189289 entries filtered down to 410310 Merging base file with filtered extra file ...finished in 1:27:57.625737 --- Sorting results --- ...finished in 0:00:11.438770 --- Finishing Kartograf --- The SHA-256 hash of the result file is: 4aba172195b8dc375c2bb1b39c3508f48c212cedc02983a7dc794a373820653b Total runtime: 3:20:17.761217 ```

0xB10C commented 9 months ago

I've got a match on 4aba172195b8dc375c2bb1b39c3508f48c212cedc02983a7dc794a373820653b!

``` --- Start Kartograf --- Kartograf version: 0.4.1 Using rpki-client version 8.6. Coordinated launch mode: Waiting until 1704463200 (2024-01-05 15:00:00 CET) to launch mapping process. Countdown: 0 second(s) Starting... The epoch for this run is: 1704463200 (2024-01-05 14:00:00 UTC, local: 2024-01-05 15:00:00 CET) --- Fetching RPKI --- Downloaded TAL for AFRINIC to /home/b10c/kartograf/data/1704463200/rpki/tals/afrinic.tal Downloaded TAL for APNIC to /home/b10c/kartograf/data/1704463200/rpki/tals/apnic.tal Downloaded TAL for ARIN to /home/b10c/kartograf/data/1704463200/rpki/tals/arin.tal Downloaded TAL for LACNIC to /home/b10c/kartograf/data/1704463200/rpki/tals/lacnic.tal Downloaded TAL for RIPE to /home/b10c/kartograf/data/1704463200/rpki/tals/ripe.tal Downloading RPKI Data ...finished in 0:03:09.905992 --- Fetching IRR --- Downloading afrinic.db.gz Downloading apnic.db.route.gz Downloading apnic.db.route6.gz Downloading arin.db.gz Downloading lacnic.db.gz Downloading ripe.db.route.gz Downloading ripe.db.route6.gz ...finished in 0:00:36.541110 --- Fetching Routeviews pfx2as --- Downloading https://publicdata.caida.org/datasets/routing/routeviews-prefix2as/2024/01/routeviews-rv2-20240103-1200.pfx2as.gz Downloading https://publicdata.caida.org/datasets/routing/routeviews6-prefix2as/2024/01/routeviews-rv6-20240104-1200.pfx2as.gz ...finished in 0:00:07.256778 --- Validating RPKI --- Validating RPKI ROAs 212003 raw RKPI ROA files found. 100%|██████████████████████████████████████████████████████████████████████████| 212003/212003 [18:46<00:00, 188.22it/s] 212003 RKPI ROAs validated and saved to /home/b10c/kartograf/out/1704463200/rpki/rpki_raw.json ...finished in 0:18:57.690965 --- Parsing RPKI --- Parsing 212003 ROAs Result entries written: 451864 Duplicates found: 56623 Invalids found: 23066 Incompletes: 0 Non-ROA files: 0 ...finished in 0:01:30.792939 --- Parsing IRR --- Extracting afrinic.db.gz Extracting apnic.db.route.gz Extracting apnic.db.route6.gz Extracting arin.db.gz Extracting lacnic.db.gz Extracting ripe.db.route.gz Extracting ripe.db.route6.gz Parsing /home/b10c/kartograf/out/1704463200/irr/afrinic.db Parsing /home/b10c/kartograf/out/1704463200/irr/apnic.db.route Parsing /home/b10c/kartograf/out/1704463200/irr/apnic.db.route6 Parsing /home/b10c/kartograf/out/1704463200/irr/arin.db Parsing /home/b10c/kartograf/out/1704463200/irr/lacnic.db Parsing /home/b10c/kartograf/out/1704463200/irr/ripe.db.route Parsing /home/b10c/kartograf/out/1704463200/irr/ripe.db.route6 Found valid, unique entries: 1419403 ...finished in 0:05:35.192046 --- Merging RPKI and IRR data --- Parse base file to numpy arrays Parse extra file to Pandas DataFrame Filtering extra prefixes that were already included in the base file: 100.00% :::::::::::::::::::::::::::::::::::::::: | 236568 / 236568 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 236567 / 236567 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 236567 / 236567 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 236567 / 236567 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 236567 / 236567 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 236567 / 236567 | Finished filtering! Originally 1419403 entries filtered down to 394036 Merging base file with filtered extra file ...finished in 1:45:43.123389 --- Parsing Routeviews pfx2as --- Unzipping /home/b10c/kartograf/data/1704463200/collectors/routeviews_pfx2asn_ip4.txt.gz Formatting /home/b10c/kartograf/out/1704463200/collectors/routeviews_pfx2asn_ip4.txt Unzipping /home/b10c/kartograf/data/1704463200/collectors/routeviews_pfx2asn_ip6.txt.gz Formatting /home/b10c/kartograf/out/1704463200/collectors/routeviews_pfx2asn_ip6.txt Cleaning /home/b10c/kartograf/out/1704463200/collectors/pfx2asn.txt ...finished in 0:03:21.305320 --- Merging Routeviews and base data --- Parse base file to numpy arrays Parse extra file to Pandas DataFrame Filtering extra prefixes that were already included in the base file: 100.00% :::::::::::::::::::::::::::::::::::::::: | 198215 / 198215 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 198215 / 198215 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 198215 / 198215 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 198215 / 198215 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 198215 / 198215 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 198214 / 198214 | Finished filtering! Originally 1189289 entries filtered down to 410310 Merging base file with filtered extra file ...finished in 2:47:23.942468 --- Sorting results --- ...finished in 0:00:18.885259 --- Finishing Kartograf --- The SHA-256 hash of the result file is: 4aba172195b8dc375c2bb1b39c3508f48c212cedc02983a7dc794a373820653b Total runtime: 5:06:46.250158 ```

edit: Here's an asmap map generated from final_resuls.txt (IPv4 only). Color is based on the AS.

jurraca commented 9 months ago

I got 3d42c9c376ebcbe2e21ddf1ba9951e7940dbbf7e6e1f7df9bc182b57d760d768.

--- Start Kartograf --- Kartograf version: 0.4.1 Using rpki-client version 8.6. Coordinated launch mode: Waiting until 1704463200 (2024-01-05 14:00:00 WET) to launch mapping process. Countdown: 0 second(s) Starting... The epoch for this run is: 1704463200 (2024-01-05 14:00:00 UTC, local: 2024-01-05 14:00:00 WET) --- Fetching RPKI --- Downloaded TAL for AFRINIC to /home/base/code/kartograf/data/1704463200/rpki/tals/afrinic.tal Downloaded TAL for APNIC to /home/base/code/kartograf/data/1704463200/rpki/tals/apnic.tal Downloaded TAL for ARIN to /home/base/code/kartograf/data/1704463200/rpki/tals/arin.tal Downloaded TAL for LACNIC to /home/base/code/kartograf/data/1704463200/rpki/tals/lacnic.tal Downloaded TAL for RIPE to /home/base/code/kartograf/data/1704463200/rpki/tals/ripe.tal Downloading RPKI Data ...finished in 0:03:37.514900 --- Fetching IRR --- Downloading afrinic.db.gz Downloading apnic.db.route.gz Downloading apnic.db.route6.gz [70/112] Downloading arin.db.gz Downloading lacnic.db.gz Downloading ripe.db.route.gz Downloading ripe.db.route6.gz ...finished in 0:01:48.509536 --- Fetching Routeviews pfx2as --- Downloading https://publicdata.caida.org/datasets/routing/routeviews-prefix2as/2024/01/routeviews-rv2-20240103-1200.pfx2as.gz Downloading https://publicdata.caida.org/datasets/routing/routeviews6-prefix2as/2024/01/routeviews-rv6-20240104-1200.pfx2as.gz ...finished in 0:00:19.151505 --- Validating RPKI --- Validating RPKI ROAs 211735 raw RKPI ROA files found. 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 211735/211735 [24:50<00:00, 142.02it/s] 211735 RKPI ROAs validated and saved to /home/base/code/kartograf/out/1704463200/rpki/rpki_raw.json ...finished in 0:25:05.523637 --- Parsing RPKI --- Parsing 211735 ROAs Result entries written: 451209 Duplicates found: 56435 Invalids found: 23060 Incompletes: 0 Non-ROA files: 0 ...finished in 0:01:27.220209 --- Parsing IRR --- Extracting afrinic.db.gz Extracting apnic.db.route.gz Extracting apnic.db.route6.gz [35/112] Extracting arin.db.gz Extracting lacnic.db.gz Extracting ripe.db.route.gz Extracting ripe.db.route6.gz Parsing /home/base/code/kartograf/out/1704463200/irr/afrinic.db Parsing /home/base/code/kartograf/out/1704463200/irr/arin.db Parsing /home/base/code/kartograf/out/1704463200/irr/ripe.db.route6 Parsing /home/base/code/kartograf/out/1704463200/irr/apnic.db.route6 Parsing /home/base/code/kartograf/out/1704463200/irr/lacnic.db Parsing /home/base/code/kartograf/out/1704463200/irr/ripe.db.route Parsing /home/base/code/kartograf/out/1704463200/irr/apnic.db.route Found valid, unique entries: 1419403 ...finished in 0:05:21.492319 --- Merging RPKI and IRR data --- Parse base file to numpy arrays Parse extra file to Pandas DataFrame Filtering extra prefixes that were already included in the base file: 100.00% :::::::::::::::::::::::::::::::::::::::: | 177426 / 177426 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 177426 / 177426 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 177426 / 177426 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 177425 / 177425 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 177425 / 177425 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 177425 / 177425 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 177425 / 177425 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 177425 / 177425 | Finished filtering! Originally 1419403 entries filtered down to 395754 Merging base file with filtered extra file ...finished in 2:40:31.195831 --- Parsing Routeviews pfx2as --- Unzipping /home/base/code/kartograf/data/1704463200/collectors/routeviews_pfx2asn_ip4.txt.gz Formatting /home/base/code/kartograf/out/1704463200/collectors/routeviews_pfx2asn_ip4.txt Unzipping /home/base/code/kartograf/data/1704463200/collectors/routeviews_pfx2asn_ip6.txt.gz Formatting /home/base/code/kartograf/out/1704463200/collectors/routeviews_pfx2asn_ip6.txt Cleaning /home/base/code/kartograf/out/1704463200/collectors/pfx2asn.txt ...finished in 0:03:14.562903 --- Merging Routeviews and base data --- Parse base file to numpy arrays Parse extra file to Pandas DataFrame Filtering extra prefixes that were already included in the base file: 100.00% :::::::::::::::::::::::::::::::::::::::: | 148662 / 148662 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 148661 / 148661 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 148661 / 148661 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 148661 / 148661 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 148661 / 148661 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 148661 / 148661 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 148661 / 148661 | 100.00% :::::::::::::::::::::::::::::::::::::::: | 148661 / 148661 | Finished filtering! Originally 1189289 entries filtered down to 410773 Merging base file with filtered extra file ...finished in 4:31:07.391481 --- Sorting results --- ...finished in 0:00:19.157710 --- Finishing Kartograf --- The SHA-256 hash of the result file is: 3d42c9c376ebcbe2e21ddf1ba9951e7940dbbf7e6e1f7df9bc182b57d760d768 Total runtime: 7:52:53.590652

0xB10C commented 9 months ago

@jurraca can you upload your final_results.txt here?

Mine is here final_result.txt.gz

Emzy commented 9 months ago

I've got a match on: 4aba172195b8dc375c2bb1b39c3508f48c212cedc02983a7dc794a373820653b Total runtime: 14:56:59.428796 (slow i3 thin client)

Sjors commented 9 months ago

I also got 4aba172195b8dc375c2bb1b39c3508f48c212cedc02983a7dc794a373820653b.

Total runtime: 2:38:01.512775 which is 30 minutes faster than last time.

jurraca commented 9 months ago

@jurraca can you upload your final_results.txt here?

here it is final_result.txt.gz

0xB10C commented 9 months ago

@jurraca can you upload your final_results.txt here?

here it is final_result.txt.gz

Alright. Yeah, that's a bigger diff. Don't have the time to step through it in detail.

I've noticed that there are overlapping mappings to the same AS in my final_results.txt file. For example:

2.189.0.0/16 AS49666
2.189.1.0/24 AS49666

The /16 should already cover the /24 mapping.

fjahr commented 9 months ago

I've noticed that there are overlapping mappings to the same AS in my final_results.txt file. For example:
2.189.0.0/16 AS49666
2.189.1.0/24 AS49666
The /16 should already cover the /24 mapping.

Yeah, this is an interesting observation. We can look into doing this as a final step but we need to think about an algorithm to do this efficiently. One edge case that we would need to handle as well is if the file had an additional entry that is from a different AS and is between the two entries in terms of specificity, then we would not be able to do this, i.e. we would also need to check that no entry like 2.189.0.0/20 AS1337 exists. We can probably do this reasonably easily after the file result file is sorted. I will check how often we see candidates to decide if this is worth the effort. Another small downside is that this new final result file becomes a bit harder to debug.

fjahr commented 9 months ago

Alright, nice, so we had 4/5 for the 4aba172195b8dc375c2bb1b39c3508f48c212cedc02983a7dc794a373820653b hash. @brunoerg did you get a result?

@jurraca was the dissenting hash and as @0xB10C said it's a pretty big diff this time. I could see in the logs that there are certainly differences coming out of RPKI again. So this is kind of the situation that I described above: we have a majority for one result and I would personally be fine with recommending the 4/5 hash to run it in their node based on this. It would still be a great idea to develop tools to check what input files caused the diff explicitly and what we can learn from that but actually figuring out which of these 1000+ differing lines would be preferred will take unreasonably long with the manpower available to us.

I am still very happy with these results :) These are two runs in a row were we just had one dissenting result which makes me more confident that this process is a viable solution. It was also not the same person that dissented so it doesn't seem to be a systemic issue afaict.

I will spend some time documenting the findings now. I will also open a PR here to demo what it would look like to make a compressed result part of the repo. Please let me know if you have any further feedback on the tools, process etc.!

fjahr commented 9 months ago

Here is the demo PR for the finalized asmap.dat file: https://github.com/fjahr/asmap-data/pull/6

Would be great to get some feedback from 1-2 of you whether you receive the same result and the process makes sense like this. Thanks you!

0xB10C commented 9 months ago

I've noticed that there are overlapping mappings to the same AS in my final_results.txt file. For example:
2.189.0.0/16 AS49666
2.189.1.0/24 AS49666
The /16 should already cover the /24 mapping.
Yeah, this is an interesting observation. We can look into doing this as a final step but we need to think about an algorithm to do this efficiently. One edge case that we would need to handle as well is if the file had an additional entry that is from a different AS and is between the two entries in terms of specificity, then we would not be able to do this, i.e. we would also need to check that no entry like 2.189.0.0/20 AS1337 exists. We can probably do this reasonably easily after the file result file is sorted. I will check how often we see candidates to decide if this is worth the effort. Another small downside is that this new final result file becomes a bit harder to debug.

I took a quick look. Radix trees are commonly used as data structures in network routing for use-cases where you have entries with different network prefixes. There's a python package py-radix implementing a radix tree for python network addresses.

When building a radix tree from the addresses in final_results.txt, I check each prefix if there is already a prefix covering it. This seems to work well. However, I noticed there are a lot of cases where the smaller subnets have a different ASN than the subnet covering it.

import radix

with open("final_result.txt", 'r') as f:
    rtree = radix.Radix()

    for line in f.readlines():
        ip_str, asn = line.strip().split(" ")

        # check if there is already a prefix covering the current ip in the RadixTree
        for rnode in rtree.search_covering(ip_str):
            if asn != rnode.data:
                print(f"{rnode.prefix} already covers {ip_str}, but {ip_str} has a different ASN of {asn} compared to {rnode.data}")
        if len(rtree.search_covering(ip_str)) == 0:
            rnode = rtree.add(ip_str)
            rnode.data = asn

This prints 293720 lines similar to the following. The final_results.txt contains 1256210 entries. So about 23% are overlapping in some way.

1.0.128.0/19 (AS9737) already covers 1.0.129.0/24 (AS23969), but 1.0.129.0/24 has a different ASN!
1.0.128.0/18 (AS9737) already covers 1.0.129.0/24 (AS23969), but 1.0.129.0/24 has a different ASN!
1.0.128.0/17 (AS9737) already covers 1.0.129.0/24 (AS23969), but 1.0.129.0/24 has a different ASN!
1.6.224.0/22 (AS9583) already covers 1.6.226.0/24 (AS132215), but 1.6.226.0/24 has a different ASN!
1.6.224.0/22 (AS9583) already covers 1.6.227.0/24 (AS132215), but 1.6.227.0/24 has a different ASN!
1.6.228.0/22 (AS9583) already covers 1.6.229.0/24 (AS4755), but 1.6.229.0/24 has a different ASN!
1.7.140.0/22 (AS9583) already covers 1.7.142.0/24 (AS132215), but 1.7.142.0/24 has a different ASN!
1.7.148.0/22 (AS9583) already covers 1.7.151.0/24 (AS132215), but 1.7.151.0/24 has a different ASN!
1.7.160.0/22 (AS9583) already covers 1.7.161.0/24 (AS132215), but 1.7.161.0/24 has a different ASN!
1.7.160.0/22 (AS9583) already covers 1.7.162.0/24 (AS132215), but 1.7.162.0/24 has a different ASN!
...

I don't know (as I'm still not familiar enough with asmap) if we want to prefer the larger prefixes or the smaller ones.

Emzy commented 9 months ago

When building a radix tree from the addresses in final_results.txt, I check each prefix if there is already a prefix covering it. This seems to work well. However, I noticed there are a lot of cases where the smaller subnets have a different ASN than the subnet covering it.

AFAIK the more specific (smaller) subnet is always used for routing.

0xB10C commented 9 months ago

After taking a look at #6 and playing around with the asmap-tool, it seems that the tool takes care of it. After encoding and decoding the file, only the smaller (fewer IPs) prefixes seems to remain. E.g.:

... (from above)
1.7.160.0/22 (AS9583) already covers 1.7.161.0/24 (AS132215), but 1.7.161.0/24 has a different ASN!
1.7.160.0/22 (AS9583) already covers 1.7.162.0/24 (AS132215), but 1.7.162.0/24 has a different ASN!

# in asmap.txt (decoded asmap.dat)
1.7.161.0/24 AS132215
1.7.162.0/24 AS132215

I don't think you'd need to implement any deduplication or de-overlapping here then.

brunoerg commented 9 months ago

Alright, nice, so we had 4/5 for the 4aba172195b8dc375c2bb1b39c3508f48c212cedc02983a7dc794a373820653b hash. @brunoerg did you get a result?

Same hash: 4aba172195b8dc375c2bb1b39c3508f48c212cedc02983a7dc794a373820653b

fjahr commented 9 months ago

Thanks everyone for participating! This has been a great success! The collaborative launch feature has worked far better than I expected and we found and fixed several issues along the way.

I am closing this now but would like to do another one tomorrow in about 24 hours: https://github.com/fjahr/asmap-data/issues/7 I would be very happy if as many of you as possible could join again. My goal with this one is to have a 'clean' demo that makes it easier to grasp what the process will look like in the future when not as many issues arise as within this one. Upon success, I want use it as an example to show what we have achieved here :)

@brunoerg @Sjors @0xB10C @Emzy @jurraca

asmap / asmap-data

Demo using Collaborative Launch feature at 1702994400 (Kartograf 0.4.1) #4