buraksezer / olric

Distributed, in-memory key/value store and cache. It can be used as an embedded Go library and a language-independent service.
Apache License 2.0
3.15k stars 118 forks source link

Single member stats for existing metrics collector #82

Closed JohnStarich closed 1 month ago

JohnStarich commented 3 years ago

Is there a way to programmatically identify the current embedded server's Stats? Or more simply, collect only the current member's stats?

I'm attempting to connect Prometheus metrics to Olric so it only reports on the local instance – but not all members. Since Prometheus is typically set up to scrape all of my Kubernetes pods, I don't want the extra noise of each pod reporting the state of the world.

Perhaps we can expose only the current member's stats in a new method on the server instance? Alternatively, could the current member's auto-generated ID or name be exposed somewhere to filter the results of Stats()?

Thanks for such a great project, by the way! My team and I look forward to using this in the coming days. 😄

Related https://github.com/buraksezer/olric/issues/32, https://github.com/buraksezer/olric/issues/57

buraksezer commented 3 years ago

Hi @JohnStarich

I think the current version of Stats method is somewhat broken. In early days of Olric, I decided to collect and return all information about the cluster the node knows. Under normal conditions, a node knows nothing about a partition's statistics it doesn't own. So output of Stats is bloated and useless.

It's a good idea to collect current member's statistics. I have been planing to implement this solution in v0.4.0 and release it in May.

Alternatively, could the current member's auto-generated ID or name be exposed somewhere to filter the results of Stats()?

That's quite easy. But I'm willing to backport the above solution to v0.3.x. Because the behavior I described in the first paragraph sounds really buggy to me. What do you think?

Thanks for such a great project, by the way! My team and I look forward to using this in the coming days. 😄

Thank you! I'm really happy to hear that. I would like to hear more about your use case, if it's possible. 😄

JohnStarich commented 3 years ago

Thanks for your quick reply! My bad for the late response, last week was pretty busy for me 😅

Under normal conditions, a node knows nothing about a partition's statistics it doesn't own. So output of Stats is bloated and useless.

Aha, good to know thank you. Good to hear about the new stats work. If you need a hand, let me know!

That's quite easy. But I'm willing to backport the above solution to v0.3.x. Because the behavior I described in the first paragraph sounds really buggy to me. What do you think?

I agree, that behavior does sound pretty buggy – probably best left in the past 😄 Another thing I noticed is the current Stats() call exposes /internal/ imported structs, which is quite difficult to handle in client code and unit tests. Maybe we can figure out a way to copy over the needed fields into exported structs?

I would like to hear more about your use case, if it's possible.

Of course! We're dark-launching with Olric soon to help us provide better failure modes for multi-region aggregation. Essentially we need to resolve certain data without all regions being healthy, so using a distributed cache seems like the right approach. We're going to use 1 DMap cluster per region, with read-through caching when querying a region. It might be interesting to use a multi-region DMap, but I imagine the latency could be too high for our purposes.

Since it's a global and highly available service, it's a bit difficult keeping all data in sync in every instance of the service via normal memcaches, but we don't want the operational overhead of separate database. I really liked Olric's embedded & distributed approach, similar to Hazelcast's where I first heard about the pattern.

Next, we'll be trying out Olric distributed caches for places where one might normally use a Redis cluster. In this case, caching read-only data just above the database layer. We're excited to see how it performs!

buraksezer commented 3 years ago

Hi @JohnStarich

I have just released v0.3.7. It includes the new stats work. I have also fixed dependency problems of stats package. Thank you for reporting this.

Of course! We're dark-launching with Olric soon to help us...

It's great to hear that. I'm always here for the questions, bug reports and the other things about Olric. Keep in touch!

JohnStarich commented 3 years ago

Thanks @buraksezer! The exported structs are much easier to use, and the new stats.Stats.Member field for the current member (presumably?) is fantastic for filtering the data we need. 💯

One quick question, are the Partitions assumed to be entirely member-local now? So we could aggregate the SlabInfo across all partitions returned from Stats to get an idea of how much memory this instance is using?

JohnStarich commented 3 years ago

Thinking about how we'd index into the stats maps with uint values, it's not entirely clear that the Backups are indexed by member ID and the Partitions are their partition ID.

Maybe it'd be useful to define a type alias for member IDs and use those in the map types? I could open a PR for this if you like

// MemberID <description>
type MemberID = uint64

// Stats includes some metadata information about the cluster. The nodes add everything it knows about the cluster.
type Stats struct {
    Cmdline            []string
    ReleaseVersion     string
    Runtime            Runtime
    ClusterCoordinator Member
    Member             Member // The current member
    Partitions         map[uint64]Partition // Includes metadata for this member's partitions.
    Backups            map[MemberID]Partition
    ClusterMembers     map[MemberID]Member
}
buraksezer commented 3 years ago

are the Partitions assumed to be entirely member-local now? So we could aggregate the SlabInfo across all partitions returned from Stats to get an idea of how much memory this instance is using?

Yes, it is. Partitions is entirely member-local. SlabInfo exposes how much data is allocated to store a DMap on a particular partition. Different components of Olric also allocates memory, so Stats.Runtime may be more useful to get an idea of how much memory is used by the instance.

Olric calls runtime.ReadMemStats everytime when you call Stats. It seems that ReadMemStats calls an internal function called stopTheWorld. The documentation says:

// stopTheWorld stops all P's from executing goroutines, interrupting // all goroutines at GC safe points and records reason as the reason // for the stop. On return, only the current goroutine's P is running. // stopTheWorld must not be called from a system stack and the caller // must not hold worldsema. The caller must call startTheWorld when // other P's should resume execution. // // stopTheWorld is safe for multiple goroutines to call at the // same time. Each will execute its own stop, and the stops will // be serialized. // // This is also used by routines that do stack dumps. If the system is // in panic or being exited, this may not reliably stop all // goroutines.

In your case, if you call Stats very frequently, this may effect the performance badly. So we may add an extra option to don't read runtime's memory stats. What do you think?

Thinking about how we'd index into the stats maps with uint values, it's not entirely clear that the Backups are indexed by member ID and the Partitions are their partition ID.

You are absolutely right. It's very confusing. We should use type aliases. You should feel free to open a PR for fixes and improvements. For v0.3.x, your branch should be based on release/v0.3.0. The upcoming release is v0.4.0 and there are too many differences between those two versions. I'm not going to merge these trees at some point. So it's a good idea send two different PRs for these versions to prevent conflicts.

v0.4.0 lives under feature/issue-70. For Stats method, there are a few differences in these two branches. It should looks very similar: https://github.com/buraksezer/olric/blob/feature/issue-70/stats.go

In addition to that, Backups in Stats is indexed by partition ids. So it denotes the backups owned by the current member.


// MemberID <description>
type MemberID = uint64
...
Partitions:     make(map[PartitionID]stats.Partition),
Backups:        make(map[PartitionID]stats.Partition),
ClusterMembers: make(map[MemberID]stats.Member),
...

Thank you!

JohnStarich commented 3 years ago

your case, if you call Stats very frequently, this may effect the performance badly. So we may add an extra option to don't read runtime's memory stats. What do you think?

Oh my! I agree, there should be an option to disable "stopping the world". We also have separate monitoring of memory usage, so skipping it in an embedded Olric instance is a good idea.

We could probably make a change like this backward-compatible with either a new function that takes a struct of options, or a variadic options pattern. I usually prefer the struct pattern since it's simpler, and it's easier to find documentation of all options. On the other hand, variadic is good for keeping it all in the same function signature.

For example:

// add struct options
func (db *Olric) Stats() (stats.Stats, error)

type StatOptions struct {
    SkipRuntime bool
}
func (db *Olric) StatsOptions(opts StatOptions) (stats.Stats, error)

/////////////////////////

// OR use variadic options
type statConfig struct {
    collectRuntime bool
}
type statsOption func(statConfig)

func SkipRuntime(cfg statConfig) {
    cfg.collectRuntime = false
}

func (db *Olric) Stats(options ...statsOption) (stats.Stats, error) {
    cfg := statsConfig{collectRuntime: true}
    for _, opt := range options {
        opt(cfg)
    }
    // ...
}
// called like this
olr.Stats(olric.SkipRuntime)

So it's a good idea send two different PRs for these versions to prevent conflicts.

You've got it. I'll see if I can get to this soon. 👍

buraksezer commented 3 years ago

I'm totally fine with the variadic version of Stats but It also requires to add the same option to the Golang client and the protocol. By this way, programs outside the cluster is able to use this option.

The protocol is derived from the memcached binary protocol. It's tiny and easy to modify, I think. See that:

req := protocol.NewSystemMessage(protocol.OpStats)
req.SetExtra(protocol.StatsExtra{
    SkipRuntime: true,
})

We need to define protocol.StatsExtra struct to send options to the node. In handler, we can read the extras:

func (db *Olric) statsOperation(w, r protocol.EncodeDecoder) {
    extra, ok := r.Extra().(protocol.StatsExtra)
    if !ok {
        // Impossible
    }
    if extra.SkipRuntime {
        // Skip calling runtime.ReadMemStats
    }
    s := db.stats()
    value, err := msgpack.Marshal(s)
    if err != nil {
        neterrors.ErrorResponse(w, err)
        return
    }
    w.SetStatus(protocol.StatusOK)
    w.SetValue(value)
}

It's relatively easy. But I can handle this case, if you don't have time to deal with it.

You've got it. I'll see if I can get to this soon. 👍

Thanks in advance. I have just merged feature/issue-70 into master. So you can send your PR to master directly.

JohnStarich commented 3 years ago

Thanks @buraksezer! Sorry for my delay, lots of things came up this month and I lost track of this one 😅

For v0.4, still in beta, should we skip runtime stats by default and allow opt-in? Stop-the-world behavior seems like one of those things that should be skipped by default to avoid performance issues.

I'll start on the work for v0.4 assuming we'd skip by default at first, but can course-correct if needed.

JohnStarich commented 3 years ago

After digging in a bit, I have a couple questions for you when you have a moment:

  1. What's the backward compatibility policy on the binary protocol? Stats did not have an extras piece before, so not sending one could be the result of an old client expecting different behavior.
  2. Do we want clients to have the power to "stop the world"? Perhaps it'd be best for the server to control how often mem stats are read?
  3. For v0.4, since it's technically permitted to contain API-breaking changes from v0.3 (because not at v1.0 yet), should we just add a stats.Options struct with the CollectRuntime bool field to method signatures?
    • This would use the struct-based option passing proposal from https://github.com/buraksezer/olric/issues/82#issuecomment-829632875
    • Turns out the client & server both need to implement these options, but that would necessitate a shared config struct somewhere. It'd be much less complex if we inserted a new struct into their method signatures.
JohnStarich commented 3 years ago

@buraksezer Guessing these were the answers, based on the merge 😉

  1. No backward compatibility necessary on the binary protocol
  2. We've implicitly accepted the client can have that power
  3. We went with the backward compatible approach this time

Update on the v0.3 back port, I think we can probably wait for v0.4. We've mitigated for now by significantly increasing the metrics collection interval. I didn't observe any negative effects on a lower interval, but just to be safe it was increased.

Maybe we can keep this issue open until v0.4 leaves beta?

derekperkins commented 1 year ago

@JohnStarich are you still using Olric, and if so, did you end up finding a good solution for exposing Prometheus metrics?

JohnStarich commented 1 year ago

Hey @derekperkins, we are indeed. Our solution thus far has been to poll the Stats() method in Go and mine it for as much information as we can, emitting it via Prometheus.

We later took that a step further and now emit metrics on every key-value Get and Put operation for more fine-grained measurements like op duration, byte size, and errors.

buraksezer commented 1 month ago

Closing this issue due to inactivity.