fluxcd / pkg

Toolkit common packages
https://pkg.go.dev/github.com/fluxcd/pkg
Apache License 2.0
48 stars 85 forks source link

Add new FetchWithContext() API to ArchiveFetcher and option for downloading without extracting #737

Closed matheuscscp closed 7 months ago

matheuscscp commented 7 months ago

Fixes #736

:warning: Deprecates NewArchiveFetcher() and NewArchiveFetcherWithLogger() for New().

souleb commented 7 months ago

Is it possible to instead make the Fetch() optionally skip untar. My reasoning is that the new Download function is not used at the moment in the controllers and I think it's good to avoid exporting a new Method here for only one consumer.

darkowlzz commented 7 months ago

Is it possible to instead make the Fetch() optionally skip untar.

I think we will have a use case for this in the future. In https://github.com/fluxcd/pkg/pull/675 and https://github.com/fluxcd/pkg/pull/684, we added support for pushing and pulling static OCI artifacts which aren't compressed. I believe it was for the use case of downloaded large artifacts, like machine learning models, which took a long time and memory to uncompress if they are compressed. The OCI client package is not directly used in the controllers but only in the flux CLI at present. Using the same ArchiveFetcher to also fetch such static artifacts may be better.

Since the ArchiveFetcher constructors don't accept optional arguments, I think we'll have to create yet another constructor. Instead of creating another purpose specific constructor, it may be better to create a new constructor that accepts optional argument so that we don't need to introduce any more ArchiveFetcher constructor in the future when we have need for more configuration options.

stefanprodan commented 7 months ago

We need this for helm-controller also, as it uses the targz directly for the Helm operations.

souleb commented 7 months ago

As long as we use github.com/google/go-containerregistry I don't think we can use this for OCI artifacts.

Fetch and Download do the same thing (mostly), and introducing options to specify how to handle the artifact is more evolutive that adding an exported function.

If we do something like

type Options func(opts *options)

type options struct {
  skipUntar bool
}

func NewArchiveFetcher(retries, maxDownloadSize, maxUntarSize int, hostnameOverwrite string, opts ...Option) *ArchiveFetcher {
  return NewArchiveFetcherWithLogger(retries, maxDownloadSize, maxUntarSize, hostnameOverwrite, nil)
}

We can be backward compatible, allow for the new use case here and be open for future extensions.

darkowlzz commented 7 months ago

As long as we use github.com/google/go-containerregistry I don't think we can use this for OCI artifacts.

I believe ArchiveFetcher is for downloading source-controller artifact, not to directly downloading from a container registry. Source-controller will continue to use go-containerregistry, but consumers of SC artifacts like KC, HC, etc can/will use ArchiveFetcher to download and verify them with the given digest. Please let me know if I misunderstood your point.

We can be backward compatible, allow for the new use case here and be open for future extensions.

I thought it's less about backwards compatibility, not suggesting to break anything, but more about the purpose of the constructors based on their arguments. The presence of the required argument maxUntarSize makes untar seem to be a mandatory function of the obtained ArchiveFetcher. Setting an untar size and also configuring to skip untar may create more confusion. We can continue to keep the existing constructors as they are and introduce a new constructor which can accept all the existing arguments as optional, making it fit for multiple needs. The existing constructors can call the new constructor which can do everything they did and more with a more flexible API. Also, looking at how it's used in KC, it may also be better to add some internal defaults for retries, max download size and max untar size. Can just be unlimited sizes by default. hostnameOverwrite already is optional by its name. When not provided, no overwrite needs to be performed.

While we are discussing about improving this, I see a need to also improve the Fetch() method maybe for good. Fetching could take time and the absence of context seems strange. We can use retryablehttp.NewRequestWithContext() instead, but that'd require the Fetch() function signature to change to accept a context. And if we can pass a context to it, we don't even need to pass the logger in the constructor. The logger can be extracted from the passed context while fetching. But maybe that's not the intended use at present. It's only for setting the http client logger. It can still be a good improvement to be able to cancel the operation, but at the cost of a new method. We can continue keeping the existing method and add the improvement in the new method, which can be called Download() :smile: . Unless we have some other name. Just some ideas, don't need to implement all of it just yet.

Since the nice name NewArchiveFetcher is already in use, the package being fetch, the new constructor could be NewFetcher() or even just New(), as it's a very narrow purpose package. Just leaving some names I thought of while writing this. They could be used if we go with what I suggested or could just be ignored.

souleb commented 7 months ago

The presence of the required argument maxUntarSize makes untar seem to be a mandatory function of the obtained ArchiveFetcher. Setting an untar size and also configuring to skip untar may create more confusion.

Yes that would be really confusing I agree with you.

We can continue to keep the existing constructors as they are and introduce a new constructor which can accept all the existing arguments as optional, making it fit for multiple needs. The existing constructors can call the new constructor which can do everything they did and more with a more flexible API.

That seems the most reasonable solution.

All the other refactoring that you are talking about can be extracted in a new issue I think. Let's make sure we get this one right first.

matheuscscp commented 7 months ago

So to summarize I think the best suggestions here are:

I'm gonna submit a patch addressing all that :ok_hand:

matheuscscp commented 7 months ago

Should we also deprecate the old constructor and API?