cernopendata / opendata.cern.ch

Source code for the CERN Open Data portal
http://opendata.cern.ch/
GNU General Public License v2.0
652 stars 145 forks source link

improve download experience for slow network connections #3629

Closed tiborsimko closed 2 months ago

tiborsimko commented 2 months ago

For slow network connections, the file downloads are often being cut. Here is how to reproduce the problem. (Thanks to @chrisburr)

Download works well without rate limiting:

$  for i in $(seq 1 10); do echo "==> Test $i of 10\n" && curl -LO https://opendata-dev.cern.ch/record/212/files/HEPTutorial_0.tar && echo ""; done

==> Test 1 of 10

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 32.1M  100 32.1M    0     0  23.6M      0  0:00:01  0:00:01 --:--:-- 23.6M

==> Test 2 of 10

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 32.1M  100 32.1M    0     0  25.1M      0  0:00:01  0:00:01 --:--:-- 25.1M

==> Test 3 of 10

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 32.1M  100 32.1M    0     0  25.6M      0  0:00:01  0:00:01 --:--:-- 25.7M

==> Test 4 of 10

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 32.1M  100 32.1M    0     0  25.6M      0  0:00:01  0:00:01 --:--:-- 25.6M

==> Test 5 of 10

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 32.1M  100 32.1M    0     0  27.3M      0  0:00:01  0:00:01 --:--:-- 27.3M

==> Test 6 of 10

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 32.1M  100 32.1M    0     0  28.5M      0  0:00:01  0:00:01 --:--:-- 28.5M

==> Test 7 of 10

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 32.1M  100 32.1M    0     0  26.4M      0  0:00:01  0:00:01 --:--:-- 26.4M

==> Test 8 of 10

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 32.1M  100 32.1M    0     0  28.7M      0  0:00:01  0:00:01 --:--:-- 28.7M

==> Test 9 of 10

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 32.1M  100 32.1M    0     0  28.5M      0  0:00:01  0:00:01 --:--:-- 28.5M

==> Test 10 of 10

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 32.1M  100 32.1M    0     0  26.0M      0  0:00:01  0:00:01 --:--:-- 26.0M

Download does not work at all when rate is limited (simulating slow network connections):

$ for i in $(seq 1 10); do echo "==> Test $i of 10\n" && curl --limit-rate 200K -LO https://opendata-dev.cern.ch/record/212/files/HEPTutorial_0.tar && echo ""; done

==> Test 1 of 10

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 61 32.1M   61 19.7M    0     0   200k      0  0:02:44  0:01:40  0:01:04  203k
curl: (18) transfer closed with 13038483 bytes remaining to read
==> Test 2 of 10

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 58 32.1M   58 18.6M    0     0   201k      0  0:02:43  0:01:34  0:01:09  238k
curl: (18) transfer closed with 14135913 bytes remaining to read
==> Test 3 of 10

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 39 32.1M   39 12.5M    0     0   203k      0  0:02:42  0:01:03  0:01:39  244k
curl: (18) transfer closed with 20524772 bytes remaining to read
==> Test 4 of 10

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 42 32.1M   42 13.8M    0     0   202k      0  0:02:42  0:01:09  0:01:33  258k
curl: (18) transfer closed with 19245602 bytes remaining to read
==> Test 5 of 10

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 67 32.1M   67 21.6M    0     0   201k      0  0:02:43  0:01:50  0:00:53  234k
curl: (18) transfer closed with 10995904 bytes remaining to read
==> Test 6 of 10

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 82 32.1M   82 26.5M    0     0   200k      0  0:02:44  0:02:15  0:00:29  218k
curl: (18) transfer closed with 5912877 bytes remaining to read
==> Test 7 of 10

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 52 32.1M   52 16.8M    0     0   201k      0  0:02:43  0:01:25  0:01:18  243k
curl: (18) transfer closed with 16092579 bytes remaining to read
==> Test 8 of 10

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 54 32.1M   54 17.4M    0     0   200k      0  0:02:44  0:01:29  0:01:15  204k
curl: (18) transfer closed with 15476595 bytes remaining to read
==> Test 9 of 10

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 59 32.1M   59 19.1M    0     0   200k      0  0:02:44  0:01:37  0:01:07  204k
curl: (18) transfer closed with 13692646 bytes remaining to read
==> Test 10 of 10

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 62 32.1M   62 20.1M    0     0   200k      0  0:02:44  0:01:42  0:01:02  217k
curl: (18) transfer closed with 12586828 bytes remaining to read

Note that in the above test, I used the DEV instance for illustration, and none of the ten download attempts succeeded. The same happens on the QA instance. I've seen perhaps 1-2 tries out of 10 succeed at most.

Note also that the PROD instance behaves better, possibly due to having more resources; but it also leads to the same download troubles from time to time.

Can we look into improving the situation?