asmap / asmap-data

Demo repository for how a similar repository could be used in Bitcoin Core
MIT License
3 stars 2 forks source link

Sporadic connection issues with `ftp.arin.net` #8

Open virtu opened 8 months ago

virtu commented 8 months ago

In preparation for today's collaborative run (#7), I ran kartograf twice yesterday. The first run failed because I got disconnected from the FTP server while downloading arin.db.gz; the second run succeeded.

To add some redundancy, I scheduled the collaborative on two machines (one at home, another at Contabo). Unfortunately, both runs failed. Here's the output:

[virtu@gravity:~/kartograf]$ ./run map -w=1706191200 -irr -rv

--- Start Kartograf ---

Kartograf version: 0.4.2
Using rpki-client version 8.6.
Coordinated launch mode: Waiting until 1706191200 (2024-01-25 15:00:00 CET) to launch mapping process.
Countdown: 0 second(s)
Starting...
The epoch for this run is: 1706191200 (2024-01-25 14:00:00 UTC, local: 2024-01-25 15:00:00 CET)

--- Fetching RPKI ---

Downloaded TAL for AFRINIC to /home/virtu/kartograf/data/1706191200/rpki/tals/afrinic.tal
Downloaded TAL for APNIC to /home/virtu/kartograf/data/1706191200/rpki/tals/apnic.tal
Downloaded TAL for ARIN to /home/virtu/kartograf/data/1706191200/rpki/tals/arin.tal
Downloaded TAL for LACNIC to /home/virtu/kartograf/data/1706191200/rpki/tals/lacnic.tal
Downloaded TAL for RIPE to /home/virtu/kartograf/data/1706191200/rpki/tals/ripe.tal
Downloading RPKI Data
...finished in 0:07:36.020574

--- Fetching IRR ---

Downloading afrinic.db.gz
Downloading apnic.db.route.gz
Downloading apnic.db.route6.gz
Downloading arin.db.gz
Traceback (most recent call last):
  File "/home/virtu/kartograf/./run", line 93, in <module>
    Kartograf.map(args)
  File "/home/virtu/kartograf/kartograf/kartograf.py", line 63, in map
    fetch_irr(context)
  File "/home/virtu/kartograf/kartograf/timed.py", line 10, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/virtu/kartograf/kartograf/irr/fetch.py", line 28, in fetch_irr
    ftp = FTP(host)
          ^^^^^^^^^
  File "/nix/store/qp5zys77biz7imbk6yy85q5pdv7qk84j-python3-3.11.6/lib/python3.11/ftplib.py", line 121, in __init__
    self.connect(host)
  File "/nix/store/qp5zys77biz7imbk6yy85q5pdv7qk84j-python3-3.11.6/lib/python3.11/ftplib.py", line 162, in connect
    self.welcome = self.getresp()
                   ^^^^^^^^^^^^^^
  File "/nix/store/qp5zys77biz7imbk6yy85q5pdv7qk84j-python3-3.11.6/lib/python3.11/ftplib.py", line 244, in getresp
    resp = self.getmultiline()
           ^^^^^^^^^^^^^^^^^^^
  File "/nix/store/qp5zys77biz7imbk6yy85q5pdv7qk84j-python3-3.11.6/lib/python3.11/ftplib.py", line 230, in getmultiline
    line = self.getline()
           ^^^^^^^^^^^^^^
  File "/nix/store/qp5zys77biz7imbk6yy85q5pdv7qk84j-python3-3.11.6/lib/python3.11/ftplib.py", line 218, in getline
    raise EOFError
EOFError

Looks like EOFError signals the server closed the connection.

fjahr commented 8 months ago

@virtu Thanks for reporting this. Just to double check: You ran it once locally and it failed. Then you ran it a second time locally and it succeeded. Then you ran it a third time locally (with collaborative launch) and it succeeded again. In addition you ran it once on contabo (with collaborative launch as well) and it failed as well. Is that correct?

It sounds very strange to me. Nobody had issues with the FTP downloads during our previous tests: https://github.com/fjahr/asmap-data/issues/4

My best guess would have also been that the server closed the connection. But the question is why and why only ARIN and seemingly none of the others. arin.db.gz isn't even the largest file and certainly also not the one with the weakest infrastructure. Were you using a VPN by any chance? And did you have a fast + stable connection at the time?

virtu commented 8 months ago

Yesterday, I tried twice from home (dialup, ~100Mbps, no VPN): First time failed, second time succeeded.

To add some redundancy, today I scheduled the collaborative run on both my home machine as well as a Contabo-hosted VPS. Both runs failed. After failing, I immediately started new runs on both machines (in the hope the results' checksums will match that of others), and these both runs downloaded arin.db.gz just fine.

0xB10C commented 8 months ago

My run of #7 failed too, with the same error. Also using the nix flake.

0xB10C commented 8 months ago

I just retried it and downloading arin.db.gz worked.

fjahr commented 8 months ago

@0xB10C @virtu is it maybe a general issue with nix? I'm not sure where I could check that...

virtu commented 8 months ago

Might be an issue with a specific version of a library in the nixpkgs input locked in the flake. I'll update nixpkgs and do some trial runs to determine whether the issue persists.

0xB10C commented 8 months ago

Looks like EOFError signals the server closed the connection.

As mentioned in https://github.com/fjahr/asmap-data/issues/7#issuecomment-1913087498, it could be that the data is updated with a cronjob e.g. hourly/every 10 minutes/etc. The FTP server might close the connection when the underlying file changes while the data is being downloaded.

0xB10C commented 8 months ago

While fetching the last-modified timestamps for https://github.com/fjahr/asmap-data/issues/7#issuecomment-1913112550, I noticed ftp.lacnic.net produces an EOF error frequently just when listing the directory contents...

0xB10C commented 8 months ago

lacnic.db.gz seems to be updated every 5 minutes. I get an error when they are updating the file.

@fjahr @virtu could you check if running https://gist.github.com/0xB10C/b06d3f3b5dba081de9eaf6b4bc340c36 multiple times over the course of ~10 minutes gives you the same error? If @virtu has the same problem and @fjahr doesn't, I think that's a sign that there's a problem with the nix version.

fjahr commented 8 months ago

@0xB10C Thanks, yeah, I could reproduce the EOF errors with the script at lacnic and I also tried it with a loop of downloads and saw an EOF error there after a few minutes. I am still confused as to why this wasn't a problem until now. I have already implemented printing file hashes and I will make the IRR downloads robust against this error as a next step.

fjahr commented 8 months ago

@0xB10C So, I did some further tests this morning by adapting your script. I changed it to download the lacnic file in a loop every 30 seconds, print the hash and then delete it again. It didn't take long to see the EOF errors this time and I simply restarted the script a couple of times as soon as I saw the EOF error occurring. What this shows is that, while there are these errors, the underlying file doesn't seem to change. The file hash before and after EOF errors is identical each time. I then waited a few hours and check again now and the file is still unchanged. So it seems that the EOF error are not directly connected to file updates and simply making kartograf more robust against these errors could be a good fix. I have done this with the latest version (0.4.4).

Multiple tries this morning ``` 11:06:18: $ python3 irr-timestamps.py Success. Deleted local file: /Users/FJ/Downloads/lacnic.db, hash: a05b6ab454327f8e77d51419ee4e805b22c73652ee7729e44705e9e7721e382f Success. Deleted local file: /Users/FJ/Downloads/lacnic.db, hash: a05b6ab454327f8e77d51419ee4e805b22c73652ee7729e44705e9e7721e382f Success. Deleted local file: /Users/FJ/Downloads/lacnic.db, hash: a05b6ab454327f8e77d51419ee4e805b22c73652ee7729e44705e9e7721e382f Success. Deleted local file: /Users/FJ/Downloads/lacnic.db, hash: a05b6ab454327f8e77d51419ee4e805b22c73652ee7729e44705e9e7721e382f Success. Deleted local file: /Users/FJ/Downloads/lacnic.db, hash: a05b6ab454327f8e77d51419ee4e805b22c73652ee7729e44705e9e7721e382f Success. Deleted local file: /Users/FJ/Downloads/lacnic.db, hash: a05b6ab454327f8e77d51419ee4e805b22c73652ee7729e44705e9e7721e382f Traceback (most recent call last): File "/Users/FJ/Downloads/irr-timestamps.py", line 28, in ftp = FTP(source["server"]) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ftplib.py", line 121, in __init__ self.connect(host) File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ftplib.py", line 162, in connect self.welcome = self.getresp() ^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ftplib.py", line 244, in getresp resp = self.getmultiline() ^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ftplib.py", line 230, in getmultiline line = self.getline() ^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ftplib.py", line 218, in getline raise EOFError EOFError 11:09:43: $ python3 irr-timestamps.py Success. Deleted local file: /Users/FJ/Downloads/lacnic.db, hash: a05b6ab454327f8e77d51419ee4e805b22c73652ee7729e44705e9e7721e382f Traceback (most recent call last): File "/Users/FJ/Downloads/irr-timestamps.py", line 28, in ftp = FTP(source["server"]) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ftplib.py", line 121, in __init__ self.connect(host) File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ftplib.py", line 162, in connect self.welcome = self.getresp() ^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ftplib.py", line 244, in getresp resp = self.getmultiline() ^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ftplib.py", line 230, in getmultiline line = self.getline() ^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ftplib.py", line 218, in getline raise EOFError EOFError 11:10:21: $ python3 irr-timestamps.py Success. Deleted local file: /Users/FJ/Downloads/lacnic.db, hash: a05b6ab454327f8e77d51419ee4e805b22c73652ee7729e44705e9e7721e382f Success. Deleted local file: /Users/FJ/Downloads/lacnic.db, hash: a05b6ab454327f8e77d51419ee4e805b22c73652ee7729e44705e9e7721e382f Success. Deleted local file: /Users/FJ/Downloads/lacnic.db, hash: a05b6ab454327f8e77d51419ee4e805b22c73652ee7729e44705e9e7721e382f Success. Deleted local file: /Users/FJ/Downloads/lacnic.db, hash: a05b6ab454327f8e77d51419ee4e805b22c73652ee7729e44705e9e7721e382f Traceback (most recent call last): File "/Users/FJ/Downloads/irr-timestamps.py", line 28, in ftp = FTP(source["server"]) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ftplib.py", line 121, in __init__ self.connect(host) File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ftplib.py", line 162, in connect self.welcome = self.getresp() ^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ftplib.py", line 244, in getresp resp = self.getmultiline() ^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ftplib.py", line 230, in getmultiline line = self.getline() ^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ftplib.py", line 218, in getline raise EOFError EOFError 11:12:39: $ python3 irr-timestamps.py Traceback (most recent call last): File "/Users/FJ/Downloads/irr-timestamps.py", line 28, in ftp = FTP(source["server"]) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ftplib.py", line 121, in __init__ self.connect(host) File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ftplib.py", line 162, in connect self.welcome = self.getresp() ^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ftplib.py", line 244, in getresp resp = self.getmultiline() ^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ftplib.py", line 230, in getmultiline line = self.getline() ^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ftplib.py", line 218, in getline raise EOFError EOFError 11:12:42: $ python3 irr-timestamps.py Success. Deleted local file: /Users/FJ/Downloads/lacnic.db, hash: a05b6ab454327f8e77d51419ee4e805b22c73652ee7729e44705e9e7721e382f ```
This afternoon ``` 15:18:53: $ python3 irr-timestamps.py Success. Deleted local file: /Users/FJ/Downloads/lacnic.db, hash: a05b6ab454327f8e77d51419ee4e805b22c73652ee7729e44705e9e7721e382f ```
fjahr commented 8 months ago

@0xB10C @virtu I will give it another try tomorrow and I hope you can join: https://github.com/fjahr/asmap-data/issues/9

With v0.4.4 in case of the EOF the download will be retried a few times before aborting the whole process. Based on the investigation above I think this will be a good next step to try if the results are as we would like them with this. Keeping this issue open for now though until we have confidence in this.

0xB10C commented 8 months ago

What this shows is that, while there are these errors, the underlying file doesn't seem to change. The file hash before and after EOF errors is identical each time. I then waited a few hours and check again now and the file is still unchanged. So it seems that the EOF error are not directly connected to file updates and simply making kartograf more robust against these errors could be a good fix. I have done this with the latest version (0.4.4).

Interesting! Well, the file-created timestamp on the FTP server changes every 5 min. I'm not sure what's really going on here. Maybe the file is recreated with the same contents?

fjahr commented 8 months ago

I'm not sure what's really going on here. Maybe the file is recreated with the same contents?

Yeah, that would be my best guess as well.