go-gitea / gitea

Git with a cup of tea! Painless self-hosted all-in-one software development service, including Git hosting, code review, team collaboration, package registry and CI/CD
https://gitea.com
MIT License
43.99k stars 5.4k forks source link

Support proxy registries for each package type #21223

Open OverkillGuy opened 1 year ago

OverkillGuy commented 1 year ago

Feature Description

Spinning off https://github.com/go-gitea/gitea/issues/19270#issuecomment-1200472188 into its own ticket as recommended

I wish Gitea supported "remote", or "proxy" repositories.

These are package repositories that proxy an external source of packages, hence configured with proxy URL, but are otherwise same as local package repositories, as they can be pulled from as usual.

Example: A local Pypi.org proxy. Local build system would be configured to use both the private package registry for "internal" (private) packages, but now fetching dependencies on Pypi.org through local Gitea too.

Advantages:

This feature in Docker repositories would remove any need for Dockerhub ECR mirror, which many have to set up to avoid Dockerhub's recent rate-limiting.

The canonical example of the feature is in JFrog's Artifactory.

Effectively, Gitea would, for these proxy repositories, become a local package cache. The biggest technical decision is about when to invalidate cache (docker image's "latest" tag moves pretty quickly, but if you already have a local copy, do you serve it as-is? even if you got it 2 years ago?)

Pushing this feature to its extreme, Artifactory provides Virtual Repositories that aggregate both remote (public proxies) and local (private to org) repositories into one place.

I understand this feature can be a big investment, and acknowledge that there may be no particular need for it. I mostly envy the feature, and wish for Gitea to succeed by out-executing Artifactory, given the new Package Registry is already encroaching on that a bit.

Screenshots

Artifactory remote repository Artifactory cache advanced settings

OverkillGuy commented 1 year ago

Suggesting applying the label theme/package-registry, but I can't apply that on my own.

lunny commented 1 year ago

About what should the proxy looks like. A proxy package should have the same url and database structures as an original one but with a mirror column just like repositories and mirror repositories. So this package is readonly from user and there is an internal time to fetch from remote?

springeye commented 1 year ago

yes,I really need this feature

kvaster commented 1 year ago

Also proxy should cache data from remote. That way you may be sure you'll be able to build your project even if data is reomved from remote.

yekanchi commented 1 year ago

is this going to be something like Sonatype-Nexus or JFrog-Artifactory?

TimberBro commented 12 months ago

A proxy package should have the same url and database structures as an original one but with a mirror column just like repositories and mirror repositories.

Wouldn't it be superfluous to keep a link to a remote repository for each package?

@lunny How do you feel about the idea of having mirror settings at the organization level? As example, for any type of registry, the owner can check whether the registry is a mirror or not and if it is, the owner can set the remote-URL.

yekanchi commented 12 months ago

A proxy package should have the same url and database structures as an original one but with a mirror column just like repositories and mirror repositories.

Wouldn't it be superfluous to keep a link to a remote repository for each package?

@lunny How do you feel about the idea of having mirror settings at the organization level? As example, for any type of registry, the owner can check whether the registry is a mirror or not and if it is, the owner can set the remote-URL.

I think we can merge both.

So there is no need to specify upstream source for every pacakge/

PatrickHuetter commented 11 months ago

This feature would be awesome. We are running a nexus repository server since a few years and migrated from gitlab to gitea. With this feature we could also get rid of the nexus and have a more all in one experience in our development tasks.

KarenArzumanyan commented 5 months ago

This is a highly requested feature. We also use nexus now, which is very slow and has limitations in the oss edition.

From the characteristics of the registry operating in proxy mode: 1) Caching received packages 2) If the package is not found in the cache, then request from the external registry 3) Periodically clean old packages according to conditions - if they are not used for so many days, for example, i.e. if there was no request for them. 4) Cache size limit (when reached, the oldest packages are deleted)

We really hope for this feature. Thanks.

lunny commented 5 months ago

I think we can have two types proxies, one is a feature of Gitea which can connect to the source packages directly and pull. Another is an external proxy which could be depolyed in a DMZ and can pull packages from external of the network and then push to Gitea.

KarenArzumanyan commented 5 months ago

Yes, a good option. It is important that the registry proxy has a cache to speed up the retrieval of packages, without having to request them from the outside each time.

josh-hemphill commented 3 months ago

I've been tracking this same thing in GitLab, and just found this here. Didn't see it mentioned, so I thought I'd link to their current implementation: https://docs.gitlab.com/ee/user/packages/package_registry/dependency_proxy/ They've only released it for maven packages in a beta; in the issue threads, they've been running into lots of issues pulling it off and it's got pushed back several times, so if it get's added in Gitea, hopefully the issues GitLab have run into can be avoided here.

uvulpos commented 2 months ago

I would not enable this feature by default, so that the original url is not inside the gitea pull url. I would rather say as an administrator you can configure organisations like dockerhub oder pip so the url would be something like gitea.yourcompany.com/packages/dockerhub/docker/nginx/1.0.0.

Also for security reasons, you could define, which images are approved to pull, and which not (maybe also via wildcards or something?). Would improve security and compliance.

One thing I want to point out is: in the past there were projects that just disappeared over night and our software relied on it so it wasn't buildable anymore (or our infrastructure even deployable anymore, we had weird sys requirement admins ruling harbor). Harbor has also a caching mechanism but according to their documentation they delete the cached versions as well, if the main resource is not available anymore.

I would disagree. In rare cases you still want to use that software I would like to have an opportunity to set custom defined invalidation durations like e.g. not pulled for 6 months and flags for specific packages or packages versions that I have to delete manually

You just have to google for incidents. You'll find enough of them 🙁 https://www.darkreading.com/application-security/recent-code-sabotage-incident-latest-to-highlight-code-dependency-risks