free / prometheus

The Prometheus monitoring system and time series database.
https://prometheus.io/
Apache License 2.0
47 stars 4 forks source link

Longer-term plans for the Prometheus x-fork? #2

Open geekdave opened 6 years ago

geekdave commented 6 years ago

Hey Alin,

I've been following your posts on the Prometheus Google Group, and comments in the Grafana issue tracker for some time now.

I wanted to start off by offering my gratitude for the time you've taken to carefully and clearly outline the limitations of the Prometheus rate, increase, and delta functions, and for rolling up your sleeves and creating the code to solve these issues.

I was one of the many confused Prometheus/Grafana users who saw my graphs jumping around between refreshes. I'm tracking many slow-moving counters where the exact values of occasional (but rare) spikes are significant and meaningful. Yet, I haven't been able to use them reliably due to these limitations.

The experiments you published really made things click for me. Only when I saw your screenshots of the rate of a slowly-incrementing counter being visualized inaccurately with the stock rate and precisely with your modified xrate function, did it click for me that this was exactly the problem I was having, and the way your function represented the data matched exactly the behavior I had been seeking for months.

Having read the debate you had with the Prometheus maintainers, it seems that a philosophical impasse has been reached, and that these code changes for the moment are unlikely to be merged into a release anytime soon. My intention here is not to take sides, and I do believe that everyone involved is acting with only best intentions in mind. The stance of the maintainers seems to be that Prometheus is only designed to produce approximations of data, and that if exact results are desired, then a logs-based monitoring system should be used. That's certainly a fair line for the maintainers to draw, but I count myself as one of the (growing?) minority who believes that we're so close to having our cake and eating it too, by implementing a monitoring system that can be both precise and cheap to operate.

I wanted to start a discussion here to see how you'd feel about maintaining a long-running fork. Ideally to me, this fork would be:

  1. Maintained by a community of users who share your views, and share in the work of maintaining it
  2. Documented to justify its existence, both in a TL;DR kind of way that's easy for newcomers to grasp, and more thoroughly for those seeking to understand the nitty-gritty. Perhaps a list of "real-world" use cases contributed by the community would be helpful, where the functionality of the fork allowed meaningful results where they were not possible before. I'd be happy to help with this.
  3. Automatically kept in-sync with the upstream project, such that new releases there trigger new builds & releases here.
  4. Open to reconciliation with upstream project. Brian had indicated in a comment that perhaps these changes could be possible in Prometheus 3.0. I'm thinking back to the node/io fork and eventual reconciliation. I'm hoping that with enough friendly diplomacy and community education that perhaps the fork's features could be deemed too valuable not to merge upstream, and that the upstream maintainers concerns could be addressed in a way that is satisfactory to them.
  5. Named in a way that would be distinct from the upstream project, but descriptive enough to be recognizable. Perhaps promx? 😀

If there's a better place to have this discussion, please let me know. Perhaps a separate mailing list could be justified?

Thanks again! Dave

free commented 6 years ago

Hey Dave,

Your proposal sounds very enticing indeed. If you want to help maintaining a fork then it sounds like a plan. I have no idea how many people will actually be interested in (even using) a fork, but I've been known to work on things with little apparent external interest. ;o)

I'm happy to start writing up the reasoning and implementation details. If you have any idea how to handle keeping releases in sync with upstream, feel free to go ahead and do it. Do you think we should maintain this fork as is (i.e. under free/prometheus) or do you think it makes more sense to start a separate "organization"? (I have no experience with how that works.) In the meantime, I've added you as collaborator on this repo.

As for the naming, I guess Prometheus X makes sense (that's what I've been renaming my locally built tar files to, so I could keep track of them). Not sure whether there will be any naming issues (i.e. whether the Prometheus guys will like us stealing their name), so maybe we should ask on the devel IRC/mailing list first.

And finally, since we're considering maintaining a fork, there is another PR I've done for TSDB, back when I was testing Prometheus on a sandbox VM shared with other disk hungry VMs, causing Prometheus to hang when pushing data to disk on every collection/eval: prometheus/tsdb#149 . It's not a fix, merely a workaround and we don't have to include it, but it might be worth considering.

FYI I'm on vacation until May 7th, with no plans, so depending on how it pans out I might not have a lot of time to spend until then, maybe an hour or two in the evening.

Cheers, Alin.

geekdave commented 6 years ago

Sounds great Alin. I've posted about this on the Prometheus users group to get clarification about the project naming: https://groups.google.com/forum/#!topic/prometheus-users/x_-00hMPaXk

free commented 6 years ago

Looks like nothing with Prometheus/Prom in the name is going to fly (understandably so). I have been racking my brains, trying to come up with a reasonable name (some mythological character, "PX", "X") and then it dawned on me that we could simply call it "xrate" and leave it at that. I'll change the repository name to xrate, if you agree.

In the meantime, I'm on vacation until next week and I'll probably not get much done in the meantime.

geekdave commented 6 years ago

xrate sounds good to me!

If you have any idea how to handle keeping releases in sync with upstream, feel free to go ahead and do it.

Looks like Docker Hub supports building a new image when an upstream image changes using the Repository Links feature. However, Prometheus images are deployed to quay.io, and I haven't found a similar feature there. It may be possible to implement this using a less elegant polling system like TravisCI's Cron Jobs. I'll keep investigating...

sylr commented 6 years ago

Yeaaaaah! I can't tell you how relieved I am to find people that are as disappointed as I am by the choices that can be made by some members of the prometheus core team.

If you fork this and implement your xrate and xdelta I will be one of the first to use it.

sylr commented 6 years ago

Are you considering a hard fork or just a branch with unaccepted patches that you would rebase onto prometheus release branches and make v2.x.z+free.z tags ?

I think the latter offer more chances of reconciliation and you wouldn't have to rename all the import in the go sources.

free commented 6 years ago

I have been more or less mirroring Prometheus releases in this repository. I've missed the 2.4.x releases after 2.4.0 and I was a few days late with 2.5.0, but they're more or less all there for whoever wants to use them.

franck102 commented 2 years ago

Hi Alin,

While this is in progress do you have any plans to update this fork with the recent 2.3x Prometheus releases? We sorely need xrate and the last tag I see is for 2.19 which is already one year old.

Thanks for your great work btw! Franck

ivan commented 2 years ago

fwiw, VictoriaMetrics does the math correctly and also uses ~3x less storage space.

https://github.com/VictoriaMetrics/VictoriaMetrics/blob/e1a715b0f5cee93e5238dd1bc18990f444cb96aa/app/vmctl/README.md#migrating-data-from-prometheus

GiedriusS commented 2 years ago

I have continued the fork here: https://github.com/vinted/prometheus/tree/xrate_2.33.1. The tests pass so it should be OK.