golang / groupcache

groupcache is a caching and cache-filling library, intended as a replacement for memcached in many cases.
Apache License 2.0
12.96k stars 1.39k forks source link

Spread peers updates #45

Closed olegdunkan closed 9 years ago

olegdunkan commented 9 years ago

Hello! First of all, thanks a lot for this project.

My issue is: Is it possible to spread updates of the peers list initiated in one node to others automatically? You have this implementation

func (p *HTTPPool) Set(peers ...string) {
    p.mu.Lock()
    defer p.mu.Unlock()
    p.peers = consistenthash.New(defaultReplicas, nil)
    p.peers.Add(peers...)
    p.httpGetters = make(map[string]*httpGetter, len(peers))
    for _, peer := range peers {
        p.httpGetters[peer] = &httpGetter{transport: p.Transport, baseURL: peer + p.basePath}
    }
}

I don't see anything to make it happens in code above. Or is it inconsistent with goals of the groupcache project? And if so why? Thanks!

dvirsky commented 9 years ago

Not speaking for the groupcache authors, but not having it deal with consistently updating the peer list keeps it simple, and lets you choose the best method for you.

Personally I use it with Zookeeper as the endpoint management layer, and just update the groupcache peer list when ZK notifies me of a node added/removed.

olegdunkan commented 9 years ago

I understand that all stuff I have to do to update peers list is my responsibility but it seems to me that it is enough to update list only at one node. All other synchronization stuff between nodes have to be made by groupcache . But of course it is my assumption.
Thanks!

dvirsky commented 9 years ago

Synchronizing all nodes reliably requires a different kind of algorithm than the simple consistent hashing that GC uses (i.e. Paxos/Raft/etc). GC tries to be as stateless as it can, it also doesn't have expiration and deleting of keys to avoid these tough synchronizations. So to me it makes perfect sense that node management was left out of scope.

olegdunkan commented 9 years ago

How do you think is it good to implement like this hack for my purpose (small cluster without failure assumption)?:

First we up a new node and get it address. Then generate keys which can match all nodes. Make request for all that keys. The key has special pattern which includes the address of new node. And when each node receives this key it understands that we need to update peer list

getter := groupcache.GetterFunc(func(ctx cache.Context, key string, dest cache.Sink) error {
        if key match special pattern {
            parse it and get network addres 
            then add it to peers list //call NewHTTPPool.Set with updated list          
        } else {
            do usual key treatment 
        }
        return nil
    })

I understand that for large and very busy clusters like in google it is not suitable because it will be some inconsistent (redundancy, additional uploads, duplicate keys ) state and we have to use methods that you have advised to me. Thanks!

dvirsky commented 9 years ago

@olegdunkan It will work, but even if you want your solution to be self contained and not rely on external components, you can do stuff like adding a RAFT protocol (based on go-raft) between the nodes, allowing a node to notify the leader and have it notify the rest of the nodes. Or simply, if the new node knows the addresses of all the rest of the nodes, it can just call them via http or whatever and notify them that it's up.

olegdunkan commented 9 years ago

Yes, I have started to learn consensus algorithms, thanks you for open them to me!