carvel-dev / vendir

Easy way to vendor portions of git repos, github releases, helm charts, docker image contents, etc. declaratively
https://carvel.dev/vendir
Apache License 2.0
281 stars 50 forks source link

GitHubRelease Unpack Archive Cannot Unpack Archives with Symlinks #395

Open nebhale opened 1 month ago

nebhale commented 1 month ago

What steps did you take: Attempt to sync the following vendir.yml

apiVersion: vendir.k14s.io/v1alpha1
kind: Config
directories:
- path: vendor
  contents:
  - path: .
    githubRelease:
      slug: ollama/ollama
      latest: true
      disableAutoChecksumValidation: true
      assetNames:
      - ollama-linux-amd64.tgz
      unpackArchive:
        path: ollama-linux-amd64.tgz

What happened:

➜  vendir sync                                                                                 
Fetching: vendor + . (github release ollama/ollama@latest)

vendir: Error: Syncing directory 'vendor':
  Syncing directory '.' with github release contents:
    Expected known archive type (zip, tgz, tar)

What did you expect: I expected the tarball to be expanded to the vendor directory.

Anything else you would like to add: It's almost certainly because this code doesn't know what a symlink is. The same problem probably exists in Zip files.

Environment:


Vote on this request

This is an invitation to the community to vote on issues, to help us prioritize our backlog. Use the "smiley face" up to the right of this comment to vote.

πŸ‘ "I would like to see this addressed as soon as possible" πŸ‘Ž "There are other more important things to focus on right now"

We are also happy to receive and review Pull Requests if you want to help working on this issue.

praveenrewar commented 3 weeks ago

I agree that TypeSymlink header is not supported (or any of the header only flags), I am not sure if this is an intended behaviour or not. @joaopapereira, @Zebradil Do you have any context/thoughts on this?

joaopapereira commented 3 weeks ago

Symlinks are complicated in the general sense because they can move you around OS to places that you do not intend to. What happens if you are downloading a symlink that points to a file that has not been downloaded? Or would it point to a random directory on your laptop?

Another thing that I am not sure will work, even if we implemented the ability to get symlinks travel powers, is that you are limiting the assets that are retrieved

      assetNames:
      - ollama-linux-amd64.tgz

So vendir will only download that one particular file.

Is there any particular reason for you to download a symlink instead of the real tgz you want to unpack?

Zebradil commented 3 weeks ago

Is there any particular reason for you to download a symlink instead of the real tgz you want to unpack?

ollama-linux-amd64.tgz is actually an archive which contains symlinks. Removing the unpackArchive key from the vendir configuration allows vendir to sync successfully, but the downloaded archive is not unpacked. Here is the result of syncing and manual unpacking:

.
β”œβ”€β”€ bin
β”‚  └── ollama
β”œβ”€β”€ lib
β”‚  └── ollama
β”‚     β”œβ”€β”€ libcublas.so -> libcublas.so.12
β”‚     β”œβ”€β”€ libcublas.so.11 -> libcublas.so.11.5.1.109
β”‚     β”œβ”€β”€ libcublas.so.11.5.1.109
β”‚     β”œβ”€β”€ libcublas.so.12 -> ./libcublas.so.12.4.2.65
β”‚     β”œβ”€β”€ libcublas.so.12.4.2.65
β”‚     β”œβ”€β”€ libcublasLt.so -> libcublasLt.so.12
β”‚     β”œβ”€β”€ libcublasLt.so.11 -> libcublasLt.so.11.5.1.109
β”‚     β”œβ”€β”€ libcublasLt.so.11.5.1.109
β”‚     β”œβ”€β”€ libcublasLt.so.12 -> ./libcublasLt.so.12.4.2.65
β”‚     β”œβ”€β”€ libcublasLt.so.12.4.2.65
β”‚     β”œβ”€β”€ libcudart.so -> libcudart.so.12
β”‚     β”œβ”€β”€ libcudart.so.11.0 -> libcudart.so.11.3.109
β”‚     β”œβ”€β”€ libcudart.so.11.3.109
β”‚     β”œβ”€β”€ libcudart.so.12 -> libcudart.so.12.4.99
β”‚     └── libcudart.so.12.4.99
└── ollama-linux-amd64.tgz

It seems like a valid use case to me.

What I'd do is to implement support of symlinks in archives, but with checks that symlinks aren't pointing outside of the working directory.

nebhale commented 3 weeks ago

@Zebradil exactly the problem (it's the symlinks inside the archive). The one point I'd make is that I'd recommend hewing close to how tar handles this, which for better or worse, doesn't police that the symlinks all point to files within the archive itself. I'm completely sympathetic to your desire and the security implications around this, but I think most users will expect POSIX-tar-compatible behavior.

joaopapereira commented 3 weeks ago

Sorry, I completely misread what you said. I do share @Zebradil concerns about symlinks to outside folders. Trying to remember where we made a similar change and we did in fact fail if you tried to unpack a tar that pointed to an outside folder. I'm not sure if it was in vendir or somewhere in kapp-controller. Also, understand what you mean by we should expect the same behavior as in POSIX tar. Personally, I would be more comfortable if we did ensure that we only use "internal" links and not extract "external" links but not error out. Suppose this is something that people really want maybe we can add a flag that will remove this restriction.

nebhale commented 3 weeks ago

My opinions are very weak since my current problem is solved by your more conservative case.

joaopapereira commented 2 weeks ago

Cool, So let us implement some guardrails here for now and we will see in the future if someone need the extra feature. For this story the acceptance criteria would be something like:

Given I have the following vendir configuration

apiVersion: vendir.k14s.io/v1alpha1
kind: Config
directories:
- path: vendor
  contents:
  - path: .
    githubRelease:
      slug: ollama/ollama
      latest: true
      disableAutoChecksumValidation: true
      assetNames:
      - ollama-linux-amd64.tgz
      unpackArchive:
        path: ollama-linux-amd64.tgz

When I execute vendir sync Then the release will get downloaded and extracted to disk

Given I have a github release that contains a tar archive And that archive contains a symlink to an outside folder of the archive When I execute vendir sync Then the release will get downloaded and extracted to disk And the symlink is not extracted And I see the message Symlink <name of symlink> was not extracted because it was linking to the outside of the archive <path pointing to>

I will mark this issue as accepted and ready to be implemented. @nebhale how much would you say this is currently impacting you? I'm trying to understand how high this should be in the priority queue.

Also open to review PR if anyone in the community is interested in fixing this issue.

nebhale commented 2 weeks ago

I have a work around so not highest priority, but we'd like to see it in the next couple of months.