bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
22.98k stars 4.03k forks source link

Truncated TAR archive error during decompressing tar file #20269

Closed meteorcloudy closed 4 months ago

meteorcloudy commented 9 months ago

Description of the bug:

Context: https://github.com/bazelbuild/bazel/issues/20090#issuecomment-1819279500

Which category does this issue belong to?

External Dependency

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Can be reproduced on macOS with the same repo as https://github.com/bazelbuild/bazel/issues/20090#issue-1982707352

Which operating system are you running Bazel on?

macOS

What is the output of bazel info release?

No response

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

meteorcloudy commented 9 months ago

@bazel-io fork 7.0.0

meteorcloudy commented 9 months ago

I can confirm this still happens even if upgrading commons-compress to the latest version (1.25.0)

meteorcloudy commented 9 months ago

/cc @tjgq @Wyverald

meteorcloudy commented 9 months ago

The error is from https://github.com/search?q=repo%3Aapache%2Fcommons-compress+%22Truncated+TAR+archive%22&type=code, could there be an actual problem with the tar file?

Wyverald commented 9 months ago

I can confirm this is an issue, but having spent a fair chunk of time trying to understand the TAR format, I can only deduce that the issue stems from somewhere within the Apache Commons compress library. In any case, this wouldn't be a 7.0.0 regression; I'm pretty sure that we never supported sparse TARs. So I'm inclined to treat this as a "soft blocker" -- that is, if all non-soft blockers are resolved, we should release 7.0.0 and look to maybe resolve this in a patch release.

could there be an actual problem with the tar file?

GNU tar extracts the file just fine, so I'd say this is some feature disparity in the Java library.

meteorcloudy commented 9 months ago

@FrancoisPoinsot since the root cause lies in commons-compress, there is little we can do in Bazel without a upstream fix. I'll have to downgrade this to P2 and remove it as a release blocker for 7.0

alexeagle commented 9 months ago

@meteorcloudy is there an issue filed on commons-compress for this? Do you need community help to file that issue with a minimal repro? I'd really like to see the upstream maintainers response to this.

As this was bumped from Bazel 7 I'm now going to be forced to add repository rules to call BSD tar to replace Bazel's extract logic, which will be some sad, long-lived tech debt :(

meteorcloudy commented 9 months ago

is there an issue filed on commons-compress for this?

I tried, but didn't find any relevant issue.

Do you need community help to file that issue with a minimal repro? I'd really like to see the upstream maintainers response to this.

Yes, that would be very helpful! I'm currently stressed by some CI issues, unfortunately.

FrancoisPoinsot commented 9 months ago

@FrancoisPoinsot since the root cause lies in commons-compress, there is little we can do in Bazel without a upstream fix. I'll have to downgrade this to P2 and remove it as a release blocker for 7.0

As far as I know, the problem is not new to 7.0.0. I can confirm it was also present in 6.x.

FrancoisPoinsot commented 9 months ago

My current workaround is to extract the file using tar command and reference the extracted file using an http_file rule.

alexeagle commented 9 months ago

Repro is trivial:

#!/usr/bin/env bash

set -o errexit -o nounset

echo "Downloading commons-compress"
wget https://repo1.maven.org/maven2/org/apache/commons/commons-compress/1.25.0/commons-compress-1.25.0.jar
echo "Downloading sample sparse archive"
wget https://github.com/astral-sh/ruff/releases/download/v0.1.6/ruff-aarch64-apple-darwin.tar.gz
gunzip ruff-aarch64-apple-darwin.tar.gz

echo "Testing with system tar"
tar -tf ruff-aarch64-apple-darwin.tar
echo "Testing with commons-compress"
java -jar commons-compress-1.25.0.jar ruff-aarch64-apple-darwin.tar

->

Testing with system tar
ruff
Testing with commons-compress
Analysing ruff-aarch64-apple-darwin.tar
Created org.apache.commons.compress.archivers.tar.TarArchiveInputStream@17f052a3
ruff
Exception in thread "main" java.io.IOException: Truncated TAR archive
        at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.read(TarArchiveInputStream.java:694)
        at org.apache.commons.compress.utils.IOUtils.readFully(IOUtils.java:244)
        at org.apache.commons.compress.utils.IOUtils.skip(IOUtils.java:355)
        at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:451)
        at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextEntry(TarArchiveInputStream.java:426)
        at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextEntry(TarArchiveInputStream.java:50)
        at org.apache.commons.compress.archivers.Lister.listStream(Lister.java:79)
        at org.apache.commons.compress.archivers.Lister.main(Lister.java:133)

The hard part is getting into my Jira account on the Apache foundation to file it. @tjgq do you have an account there to file it at https://issues.apache.org/jira/projects/COMPRESS/issues/COMPRESS-598?filter=allopenissues ? You're probably the better reporter as you've been doing the coding.

rbtcollins commented 9 months ago

https://issues.apache.org/jira/browse/COMPRESS-124 seems relevant

Wyverald commented 9 months ago

https://issues.apache.org/jira/browse/COMPRESS-124 seems relevant

This seems to be about the originally missing support for sparse tarballs altogether. Our issue is more about the newly added support potentially having bugs.


I tried to sign up for a Jira account, which apparently requires human review and could take a few days. In the meantime, I sent an email to the mailing list (user@commons.apache.org); let's see if anyone picks it up.

Wyverald commented 9 months ago

Filed https://issues.apache.org/jira/projects/COMPRESS/issues/COMPRESS-654

keith commented 9 months ago

As a workaround you can do:

http_file(
    name = "ruff_macos",
    sha256 = "263d8ec3fd317b47dfefeae84d96e1894f87526f788394df59a0c6b013dac5d7",
    url = "https://github.com/astral-sh/ruff/releases/download/v0.1.8/ruff-0.1.8-x86_64-apple-darwin.tar.gz",
)

and then:

genrule(
    name = "ruff_bin",
    srcs = ["@ruff_macos//file"],
    outs = ["ruff-bin"],
    cmd = "tar -xvf $< && mv ruff $@",
)

since macOS tar handles this fine

alexeagle commented 9 months ago

Thanks Keith, I should have commented here that I worked around it in rules_lint in that way: https://github.com/aspect-build/rules_lint/pull/66/files#diff-88872655967d360b7907682cbc2461f815c86c2940469330183be99e6f1b3ec2R129-R137

iancha1992 commented 4 months ago

A fix for this issue has been included in Bazel 7.2.0 RC1. Please test out the release candidate and report any issues as soon as possible. If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=7.2.0rc1. Thanks!