Open zlalanne opened 5 years ago
This is very related, not to say practically the same as: https://github.com/conan-io/conan/issues/2170
The solution should be also the same, and generic enough for other purposes. We see it as the "plugins" system, another extension point, different to "hooks", that can be used to implement some funcionality at the user level.
Adding this one to the "Plugins" project. To be clear, we really want to do these optimizations, and implement this plugins mechanism, but there is still quite a few more important things in the roadmap, so probably this has to wait at least until Conan 2.0.
Thanks for the quick response. Are there any documents on how the plugins sytem will work? Rough estimate of Conan 2.0?
Willing to help out on this one, but sounds like the correct plumbing to add this isn't defined yet.
Hi @zlalanne
Not yet, you can probably get some idea having a look to the current "hooks" feature, it might be similar, but the extension point being a replacement or extension of internal conan functionality, instead of an orthogonal extension to recipes and package build flow (hooks). But yes, you are right, we didn't find enough time to define the plumbing for plugins yet, so seems too early to contribute to this (the tar functionality). However, it might be useful a PoC of such a plugin, for this functionality, to start learning about it, if you want to try it, that will be welcome:
The less important thing right now would be the actual tar functionality
There is no estimate for Conan 2.0 yet, but would be nice to have it before the end of the year. There are a few very big things going on, like the lockfiles and improving the cross-build model, which we need to finish and stabilize before going for 2.0.
cc /@sztomi
Great timing. Since we are experimenting with .pyz builds of conan, I decided to apply some patches, and one of that was exactly aimed at improving the extraction performance. This isn't general purpose, because it doesn't model the keep_permissions parameter (it keeps them always, regardless of the value). Basically, it uses 7z if available and falls back to the python code if not.
--- a/conans/client/tools/files.py
+++ b/conans/client/tools/files.py
@@ -64,8 +64,16 @@
filename.endswith(".tbz2") or filename.endswith(".tar.bz2") or
filename.endswith(".tar")):
return untargz(filename, destination)
- import zipfile
full_path = os.path.normpath(os.path.join(get_cwd(), destination))
+ from shutil import which
+ if which("7z") is not None:
+ import subprocess as sp
+ # 7z command line parser is very anal, it works best this way
+ cmd = f'7z x {filename} -o"{full_path}" -aoa'
+ print(f"Calling 7z: {cmd}")
+ sp.run(cmd, shell=True)
+ return
+ import zipfile
if hasattr(sys.stdout, "isatty") and sys.stdout.isatty():
def print_progress(the_size, uncomp_size):
@@ -111,6 +119,11 @@
def untargz(filename, destination="."):
+ from shutil import which
+ if which("7z") is not None:
+ import subprocess as sp
+ sp.run(f"7z x {filename} -so | 7z x -si -ttar -o\"{destination}\"", shell=True)
+ return
import tarfile
with tarfile.TarFile.open(filename, 'r:*') as tarredgzippedFile:
tarredgzippedFile.extractall(destination)
(I also added my copytree patch and have one more in the works: pycurl instead of requests).
Thanks for sharing, can you share any performance improvement benchmarks?
My original request was talking about the untar-ing happening when conan downloads packages from a repository and extracts them into the cache. I didn't think this uses tools.untargz, but I think something similar could be done there.
No, it doesn't uses tools.untargz, but the solution could be extensible to both entry points.
@zlalanne still experimenting (for example, I just found a bug in it), so no benchmarks yet, but it is noticably faster. I do have benchmarks for the copytree patch. I originally opened a PR for that here, but ultimately Conan decided to take a different approach. The numbers are in the PR: https://github.com/conan-io/conan/pull/2165
To help us debug your issue please explain:
Using Conan 1.15.1 on Ubuntu 18.04, it seems untar-ing conan packages downloaded from a repository is a rather slow process. Digging into it more, I think this is because Conan uses the build in python tarfile module for doing the tar decompression. We have some packages that have grown into 100 of MBs, and this can take quite a long time. From what I've read, the tarfile module is single threaded, and we are working on servers with several cores that we'd like to take advantage of.
I understand though it is nice to have a good cross-platform solution using the python tarfile module and perhaps that should be the default.
It would be ideal if the user could tell Conan to use the tar on the path if it exists. Some ideas:
extract_tool
variable to the config file that takes in/path/to/tarball
andextract_dir
as the only two parameters. Users need to write their own tool that conan will call if this variable is set. This could be a little shell script wrapper for their environment that could call tar, pigz, etc... using the two parameters passed in.tar_location
variable to the config file/environment variable. User could set this to/usr/bin/tar
(or wherever tar is). Conan would only use this if it was set and then launch a subprocess doing something liketar -xf conan_package.tar.gz