dzikoysk / reposilite

Lightweight and easy-to-use repository management software dedicated for the Maven based artifacts in the JVM ecosystem 📦
https://reposilite.com
Apache License 2.0
1.29k stars 169 forks source link

Reposilite availability unpredictable during upload #2121

Closed malliaridis closed 1 month ago

malliaridis commented 1 month ago

What happened?

Bug Description

When uploading an artifact to a kubernetes deployment of reposilite with port-forward reposilite becomes unavailable most of the times and the upload rarely completes successfully.

Environment

Steps to reproduce

  1. Deploy a simple reposilite instance on kubernetes according to guide
  2. Port-forward to a port to make reposilite available on the host machine
  3. Generate a token
  4. Try to upload an artifact with maven-publish
  5. Check web page (should be unreachable) and port-forwarding output (should return at some point a timeout)

Expected Behavior

Reposilite availability should be reliable and consistent.

Actual Behavior

Reposilite availability during uploads is unpredictable.

Additional Information

The uploaded file is about 50MB-100MB large. Initially I thought this was because of an outdated / invalid token (because it behaves the same way), but after double-checking it this seems not to be related to tokens.

When not uploading a file, reposilite looks to work just fine. The gradle task does also never end when the upload fails.

Reposilite version

3.x

Relevant log output

Nothing was logged on reposilite site (or I couldn't find any logs)


port-forwarding error during upload (repeating):

Handling connection for 8083
E0527 11:41:12.692728   27436 portforward.go:347] error creating error stream for port 8083 -> 8080: Timeout occurred
Handling connection for 8083
E0527 11:40:10.649155   27436 portforward.go:347] error creating error stream for port 8083 -> 8080: Timeout occurred
dzikoysk commented 1 month ago

I don't think this issue is related to Reposilite itself, it has to be something with your setup (or default Helm chart values). Because of that, I'd keep this coversation here:

I also don't really know your setup - WSL2 + Docker Desktop + K3D (which is some sort of another wrapper for k3s that is not even a k8s) sounds like a virtualization nightmare and I simply lack knowledge in this area :sweat: Are you able to reproduce this on pure Docker container?

malliaridis commented 1 month ago

I've been troubleshooting it the last couple days to figure out the root cause. From all the systems I've been working with, only reposilite on the cluster had this kind of behavior, which is the reason I opened an issue. So far I managed to keep the service availability up during uploads and without interrupts, but I haven't figured the exact condititions for this and I still have cases where it acts weird again.

I have also played around with the chart values and changed things like memory, but without success. I suspect that there is something different with the way the service / pod is configured (or some limits that may affect fast uploads).

I'd keep this coversation here: https://github.com/reposilite-playground/reposilite-helm

That explains why I couldn't find any helm-related topics and code in this repo. Are you the owner of the other repo to transfer the issue via GitHub? Or should I close and reopen the issue there?

sounds like a virtualization nightmare

You are spot on. There are not many alternatives for the current requirements and conditions I have to avoid all that virtualization.

Are you able to reproduce this on pure Docker container?

If my suspicions are correct, this may occur only on a Kubernetes setup due to the way the service is accessed / made available on the host machine. I'll share my findings in the issue if I end up figuring things out.

dzikoysk commented 1 month ago

I suspect that there is something different with the way the service / pod is configured (or some limits that may affect fast uploads)

It's hard to tell, as far as I know our Helm chart is quite close to the default values. Not really my thing, so all these changes are purely community-driven and I'm only triggering releases.

Are you the owner of the other repo to transfer the issue via GitHub? Or should I close and reopen the issue there?

Yes, but unfortunately I can't transfer this issue from personal account to organization, so we still need a new ticket.

If my suspicions are correct, this may occur only on a Kubernetes setup due to the way the service is accessed / made available on the host machine.

Despite the network/resources, it might be also a good idea to check if it's not related to file system. In the default mode, Reposilite stores data on local fs that is mounted as Docker volume. Maybe there's some weird synchronization mechanism for multi-instance setup that weirdly underperforms.

malliaridis commented 1 month ago

Let's continue the conversion in https://github.com/reposilite-playground/reposilite-helm/issues/17.

I'll report my findings there. The file system was the next thing I wanted to test next.