conan-io / conan

Conan - The open-source C and C++ package manager
https://conan.io
MIT License
8.17k stars 974 forks source link

Improve Conan untar speed #5209

Open zlalanne opened 5 years ago

zlalanne commented 5 years ago

To help us debug your issue please explain:

Using Conan 1.15.1 on Ubuntu 18.04, it seems untar-ing conan packages downloaded from a repository is a rather slow process. Digging into it more, I think this is because Conan uses the build in python tarfile module for doing the tar decompression. We have some packages that have grown into 100 of MBs, and this can take quite a long time. From what I've read, the tarfile module is single threaded, and we are working on servers with several cores that we'd like to take advantage of.

I understand though it is nice to have a good cross-platform solution using the python tarfile module and perhaps that should be the default.

It would be ideal if the user could tell Conan to use the tar on the path if it exists. Some ideas:

  1. Add a extract_tool variable to the config file that takes in /path/to/tarball and extract_dir as the only two parameters. Users need to write their own tool that conan will call if this variable is set. This could be a little shell script wrapper for their environment that could call tar, pigz, etc... using the two parameters passed in.
  2. Add a tar_location variable to the config file/environment variable. User could set this to /usr/bin/tar (or wherever tar is). Conan would only use this if it was set and then launch a subprocess doing something like tar -xf conan_package.tar.gz
  3. Automatically search the user's ${PATH} to see if tar exists. Use that instead of the built in python module.
memsharded commented 5 years ago

This is very related, not to say practically the same as: https://github.com/conan-io/conan/issues/2170

The solution should be also the same, and generic enough for other purposes. We see it as the "plugins" system, another extension point, different to "hooks", that can be used to implement some funcionality at the user level.

memsharded commented 5 years ago

Adding this one to the "Plugins" project. To be clear, we really want to do these optimizations, and implement this plugins mechanism, but there is still quite a few more important things in the roadmap, so probably this has to wait at least until Conan 2.0.

zlalanne commented 5 years ago

Thanks for the quick response. Are there any documents on how the plugins sytem will work? Rough estimate of Conan 2.0?

Willing to help out on this one, but sounds like the correct plumbing to add this isn't defined yet.

memsharded commented 5 years ago

Hi @zlalanne

Not yet, you can probably get some idea having a look to the current "hooks" feature, it might be similar, but the extension point being a replacement or extension of internal conan functionality, instead of an orthogonal extension to recipes and package build flow (hooks). But yes, you are right, we didn't find enough time to define the plumbing for plugins yet, so seems too early to contribute to this (the tar functionality). However, it might be useful a PoC of such a plugin, for this functionality, to start learning about it, if you want to try it, that will be welcome:

The less important thing right now would be the actual tar functionality

There is no estimate for Conan 2.0 yet, but would be nice to have it before the end of the year. There are a few very big things going on, like the lockfiles and improving the cross-build model, which we need to finish and stabilize before going for 2.0.

cc /@sztomi

sztomi commented 5 years ago

Great timing. Since we are experimenting with .pyz builds of conan, I decided to apply some patches, and one of that was exactly aimed at improving the extraction performance. This isn't general purpose, because it doesn't model the keep_permissions parameter (it keeps them always, regardless of the value). Basically, it uses 7z if available and falls back to the python code if not.

--- a/conans/client/tools/files.py
+++ b/conans/client/tools/files.py
@@ -64,8 +64,16 @@
             filename.endswith(".tbz2") or filename.endswith(".tar.bz2") or
             filename.endswith(".tar")):
         return untargz(filename, destination)
-    import zipfile
     full_path = os.path.normpath(os.path.join(get_cwd(), destination))
+    from shutil import which
+    if which("7z") is not None:
+        import subprocess as sp
+        # 7z command line parser is very anal, it works best this way
+        cmd = f'7z x {filename} -o"{full_path}" -aoa'
+        print(f"Calling 7z: {cmd}")
+        sp.run(cmd, shell=True)
+        return
+    import zipfile

     if hasattr(sys.stdout, "isatty") and sys.stdout.isatty():
         def print_progress(the_size, uncomp_size):
@@ -111,6 +119,11 @@

 def untargz(filename, destination="."):
+    from shutil import which
+    if which("7z") is not None:
+        import subprocess as sp
+        sp.run(f"7z x {filename} -so | 7z x -si -ttar -o\"{destination}\"", shell=True)
+        return
     import tarfile
     with tarfile.TarFile.open(filename, 'r:*') as tarredgzippedFile:
         tarredgzippedFile.extractall(destination)

(I also added my copytree patch and have one more in the works: pycurl instead of requests).

zlalanne commented 5 years ago

Thanks for sharing, can you share any performance improvement benchmarks?

My original request was talking about the untar-ing happening when conan downloads packages from a repository and extracts them into the cache. I didn't think this uses tools.untargz, but I think something similar could be done there.

memsharded commented 5 years ago

No, it doesn't uses tools.untargz, but the solution could be extensible to both entry points.

sztomi commented 5 years ago

@zlalanne still experimenting (for example, I just found a bug in it), so no benchmarks yet, but it is noticably faster. I do have benchmarks for the copytree patch. I originally opened a PR for that here, but ultimately Conan decided to take a different approach. The numbers are in the PR: https://github.com/conan-io/conan/pull/2165