Slow download of generated files

d4nuu8 commented 1 week ago

We were investigating slow builds and found one main reason is the very slow download of generated files via SOAP.

The get_file RPC call is splitting each file into 5MB parts.

Increasing the size of each part to 100MB almost halved the build time for us from 55min to 30 min.

Is this something you would accept as a change, or do you see any downside of this?

t-8ch commented 4 days ago

I can't reproduce these results. For me the 100MiB parts are in fact a bit slower.

# build image
$ elbe initvm submit $XML --keep-files --skip-download

# src-cdrom-main.iso is 1.4 GiB large

# 5MiB
$ time ./elbe control get_file "/var/cache/elbe/019324de-41be-723c-bc55-51ecbc16d172" src-cdrom-main.iso --output x1
src-cdrom-main.iso saved

real    1m25.830s
user    0m12.012s
sys 0m5.475s

$ time ./elbe control get_file "/var/cache/elbe/019324de-41be-723c-bc55-51ecbc16d172" src-cdrom-main.iso --output x1
src-cdrom-main.iso saved

real    1m42.205s
user    0m10.038s
sys 0m6.703s

Maybe we can get rid of the whole custom chunking base64 logic and get a nice binary stream instead.

t-8ch commented 4 days ago

Can you try this:

diff --git a/elbepack/soapclient.py b/elbepack/soapclient.py
index af5b302464c2..70e7333d72e8 100644
--- a/elbepack/soapclient.py
+++ b/elbepack/soapclient.py
@@ -13,6 +13,7 @@ import sys
 import time
 from http.client import BadStatusLine
 from urllib.error import URLError
+from urllib.request import urlretrieve

 from suds.client import Client

@@ -80,9 +81,6 @@ class ElbeSoapClient:
                    args.soaptimeout, retries=args.retries)

     def download_file(self, builddir, filename, dst_fname):
-        fp = open(dst_fname, 'wb')
-        part = 0
-
         # XXX the retry logic might get removed in the future, if the error
         # doesn't occur in real world. If it occurs, we should think about
         # the root cause instead of stupid retrying.
@@ -90,27 +88,20 @@ class ElbeSoapClient:

         while True:
             try:
-                ret = self.service.get_file(builddir, filename, part)
+                urlretrieve(f'http://{self.host}:{self.port}/repo/{builddir}/{filename}', dst_fname)
+                break
             except BadStatusLine as e:
                 retry = retry - 1

-                print(f'get_file part {part} failed, retry {retry} times',
+                print(f'get_file {filename} failed, retry {retry} times',
                       file=sys.stderr)
                 print(str(e), file=sys.stderr)
                 print(repr(e.line), file=sys.stderr)

                 if not retry:
-                    fp.close()
                     print('file transfer failed', file=sys.stderr)
                     sys.exit(170)

-            if ret == 'EndOfFile':
-                fp.close()
-                return
-
-            fp.write(binascii.a2b_base64(ret))
-            part = part + 1
-
     @staticmethod
     def _upload_file(append, build_dir, filename):
         size = 1024 * 1024

It's a bit faster: "real 0m8.649s"

t-8ch commented 4 days ago

Proper patch: https://lists.linutronix.de/pipermail/elbe-devel/2024-November/007563.html

d4nuu8 commented 3 days ago

This is even better than our hack 🚀

With our hack it takes about 3 minutes to download the artifacts after the build. Your patch takes 6 seconds. 👍

Linutronix / elbe

Slow download of generated files #418