Closed erhan- closed 2 years ago
I'll take a look!
After much digging I found a file where this occurs: When I run dtrx -rn on this. It silently fails until you kill the cpio subprocess it runs. Then you will see errors like:
cpio: Malformed number |#x;
cpio: Malformed number |#x;`
cpio: Malformed number x|#x;`
cpio: Malformed number |#x;`
cpio: Malformed number #x;`
cpio: Malformed number #x;`
cpio: Malformed number x;`
!$ file cpio
cpio: ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1 (SYSV), dynamically linked, interpreter /lib/ld.so.1, for GNU/Linux 2.6.4, stripped
I am not sure how to upload this file.
So I renamed the file and tried to run dtrx -rn again and see there:
dtrx -rn testoo
dtrx: ERROR: could not handle testoo
dtrx: ERROR: not a known archive type
Okay lets rename it back again to "cpio": Aaaand stuck again :)
So lets see in the function try_by_extension():
https://github.com/dtrx-py/dtrx/blob/5aee09c12de0d57c2f77ee6b04a19ca368792b12/scripts/dtrx#L1347
So the problem is basically if the file has no dots and has the same name as a known extension.
>>> filename = "cpio"
>>> parts = filename.split(".")[-2:]
>>> parts
['cpio']
>>> filename = "blabla.cpio"
>>> parts = filename.split(".")[-2:]
>>> parts
['blabla', 'cpio']
>>> filename = "blabla.jdksj.dsjkj.tar.gz"
>>> parts = filename.split(".")[-2:]
>>> parts
['tar', 'gz']
This means we should check in this function if len(parts) >1 and then add to the results.
There are many ways to achieve this, implementation can vary:
def try_by_extension(cls, filename):
parts = filename.split('.')[-2:]
results = []
if len(parts) == 1:
return results
while parts:
results.extend(cls.extension_map.get('.'.join(parts), []))
del parts[0]
return results
I just wrote it like this but you can do in any other way.
And lets test it:
!$ dtrx -rn cpio
dtrx: ERROR: could not handle cpio
dtrx: ERROR: not a known archive type
I will create a MR at home.
AH! the problem is due to cpio not doing magic number verification before attempting extraction, and it crashes/hangs (depending on the particular cpio
binary it's attempting to extract).
Reproducing is quite easy:
# this image contains the cpio binary that causes the extraction to hang
❯ docker pull alpine:3.13.6
❯ docker image save alpine:3.13.6 -o alpine.tar.gz
❯ dtrx -rn alpine.tar.gz
The fundamental command that hangs is:
# note: this is the extracted image from above
❯ cpio -i --make-directories --quiet --no-absolute-filenames --file alpine/f7055e235a8665ac2ae79f29bd773c7a40b409e9c5d71905fb6bcb6458d9b66a/layer/usr/bin/cpio
Your fix looks good! I've put a PR up at #15.
I have a wrapper for dtrx and use it in a batch process.
When docker images are saved like in the documentation and dtrx is run on the exported tar.gz, it extracts everything and somewhere runs cpio on it. During this command it gets stuck and shows error about malformed content. I have to kill the cpio command at that moment so that my batch process continues. Has anyone extracted a docker image with dtrx before?