Open virtu opened 9 months ago
@virtu Thanks for reporting this. Just to double check: You ran it once locally and it failed. Then you ran it a second time locally and it succeeded. Then you ran it a third time locally (with collaborative launch) and it succeeded again. In addition you ran it once on contabo (with collaborative launch as well) and it failed as well. Is that correct?
It sounds very strange to me. Nobody had issues with the FTP downloads during our previous tests: https://github.com/fjahr/asmap-data/issues/4
My best guess would have also been that the server closed the connection. But the question is why and why only ARIN and seemingly none of the others. arin.db.gz isn't even the largest file and certainly also not the one with the weakest infrastructure. Were you using a VPN by any chance? And did you have a fast + stable connection at the time?
Yesterday, I tried twice from home (dialup, ~100Mbps, no VPN): First time failed, second time succeeded.
To add some redundancy, today I scheduled the collaborative run on both my home machine as well as a Contabo-hosted VPS. Both runs failed. After failing, I immediately started new runs on both machines (in the hope the results' checksums will match that of others), and these both runs downloaded arin.db.gz
just fine.
My run of #7 failed too, with the same error. Also using the nix flake.
I just retried it and downloading arin.db.gz
worked.
@0xB10C @virtu is it maybe a general issue with nix? I'm not sure where I could check that...
Might be an issue with a specific version of a library in the nixpkgs input locked in the flake. I'll update nixpkgs and do some trial runs to determine whether the issue persists.
Looks like EOFError signals the server closed the connection.
As mentioned in https://github.com/fjahr/asmap-data/issues/7#issuecomment-1913087498, it could be that the data is updated with a cronjob e.g. hourly/every 10 minutes/etc. The FTP server might close the connection when the underlying file changes while the data is being downloaded.
While fetching the last-modified timestamps for https://github.com/fjahr/asmap-data/issues/7#issuecomment-1913112550, I noticed ftp.lacnic.net
produces an EOF error frequently just when listing the directory contents...
lacnic.db.gz
seems to be updated every 5 minutes. I get an error when they are updating the file.
@fjahr @virtu could you check if running https://gist.github.com/0xB10C/b06d3f3b5dba081de9eaf6b4bc340c36 multiple times over the course of ~10 minutes gives you the same error? If @virtu has the same problem and @fjahr doesn't, I think that's a sign that there's a problem with the nix version.
@0xB10C Thanks, yeah, I could reproduce the EOF errors with the script at lacnic and I also tried it with a loop of downloads and saw an EOF error there after a few minutes. I am still confused as to why this wasn't a problem until now. I have already implemented printing file hashes and I will make the IRR downloads robust against this error as a next step.
@0xB10C So, I did some further tests this morning by adapting your script. I changed it to download the lacnic file in a loop every 30 seconds, print the hash and then delete it again. It didn't take long to see the EOF errors this time and I simply restarted the script a couple of times as soon as I saw the EOF error occurring. What this shows is that, while there are these errors, the underlying file doesn't seem to change. The file hash before and after EOF errors is identical each time. I then waited a few hours and check again now and the file is still unchanged. So it seems that the EOF error are not directly connected to file updates and simply making kartograf more robust against these errors could be a good fix. I have done this with the latest version (0.4.4).
@0xB10C @virtu I will give it another try tomorrow and I hope you can join: https://github.com/fjahr/asmap-data/issues/9
With v0.4.4 in case of the EOF the download will be retried a few times before aborting the whole process. Based on the investigation above I think this will be a good next step to try if the results are as we would like them with this. Keeping this issue open for now though until we have confidence in this.
What this shows is that, while there are these errors, the underlying file doesn't seem to change. The file hash before and after EOF errors is identical each time. I then waited a few hours and check again now and the file is still unchanged. So it seems that the EOF error are not directly connected to file updates and simply making kartograf more robust against these errors could be a good fix. I have done this with the latest version (0.4.4).
Interesting! Well, the file-created timestamp on the FTP server changes every 5 min. I'm not sure what's really going on here. Maybe the file is recreated with the same contents?
I'm not sure what's really going on here. Maybe the file is recreated with the same contents?
Yeah, that would be my best guess as well.
In preparation for today's collaborative run (#7), I ran kartograf twice yesterday. The first run failed because I got disconnected from the FTP server while downloading arin.db.gz; the second run succeeded.
To add some redundancy, I scheduled the collaborative on two machines (one at home, another at Contabo). Unfortunately, both runs failed. Here's the output:
Looks like
EOFError
signals the server closed the connection.