dzikoysk / reposilite

Lightweight and easy-to-use repository management software dedicated for the Maven based artifacts in the JVM ecosystem 📦
https://reposilite.com
Apache License 2.0
1.35k stars 179 forks source link

Add performance reports & comparisons to other major maven hosts #1137

Open solonovamax opened 2 years ago

solonovamax commented 2 years ago

Request Details

I found reposilite and am very interested in this project. And although I don't doubt the claims that reposilite, I'd be interested to see how it compares in terms of performance to the other major players.

I believe the following measurements would be beneficial to record:

The following statistics should also be recorded for each test execution, so that the bottlenecks of each server can be observed:

The tests should also be run in the following environments:

(If you can't/don't want to run it in all those different environments because of money, it can always just be run with jvm args to limit mem usage, to simulate lower spec systems.)

The test would also be run with different amounts of artifacts:

To get such a large quantity of artifacts, a random selection of artifacts can be downloaded from maven central and be rehosted for the purpose of testing.

I would be more than happy to help with writing the programs to perform these tests, or any other ways I'd be able to contribute.

dzikoysk commented 2 years ago

I think that when we'll have stable 3.x we could invest some time to write benchmarks. A few notes to the proposed scenarios:

Also, you need 2 machines to perform reliable tests - for client & server.

solonovamax commented 2 years ago

Reposilite does not use cache, so there is no point to take into account tbh

Still, that is a reason to take it into account. Performance benchmarks should be used to show users whether or not this tool is appropriate for them. So, being transparent that the other repository servers perform better with large quantities of ram and cache should be shown. (Not saying it already isn't, but this would give users an idea at what point they should choose to use another repo server)

Imo there is no point to use anything above mid tier. Reposilite 3.x is designed to run as a microservice, so it's more about scaling through small instances in independent environment. Why? Such scaling is more effective and gives a possibility to avoid bottlenecks caused by limitations of current hardware (Reposilite will more likely block on read IO when disk won't be able to serve more data in a given period of time)

Yeah, ofc. I'm assuming it'd be done on a VPS with decent I/O speeds.

There is no point to download real artifacts, file is a file, so it could be even random bulk file with fixed size.

True, I just thought it would be good to use something that accurately models the real world.

Also, you need 2 machines to perform reliable tests - for client & server.

It could also be 2 virtual machines containerized using something liker KVM.

Also the different quantities of artifacts in the server has the goal of seeing how other maven repo servers compare when there are more/less artifacts

dzikoysk commented 2 years ago

Performance benchmarks should be used to show users whether or not this tool is appropriate for them

I mean, the result might be different, because there is e.g. disk cache, but it'd be unrelated to Reposilite internals. Also, I've added it more like a note, because I assume most people don't know how Reposilite works under the hood, so it's good to mention it anyway.

Speaking of preparing such benchmark, it should be:

  1. Transparent - represented by a Git project on GitHub, probably here: https://github.com/reposilite-playground
  2. Simple - easy to clone & launch by any user
  3. Maintainable - clean, so it's relatively easy to develop

I'd keep it simple, so we may start with only one mainstream manager:

And later we can extend it :)

solonovamax commented 2 years ago

Speaking of preparing such benchmark, it should be:

  1. Transparent - represented by a Git project on GitHub, probably here: https://github.com/reposilite-playground
  2. Simple - easy to clone & launch by any user
  3. Maintainable - clean, so it's relatively easy to develop

Of course. That was entirely implied in all of this, and such a benchmark would not mean much if it wasn't transparent, simple, and reproducible.

dzikoysk commented 2 years ago

Results could be summarized in guide: