etcd-io / zetcd

Serve the Apache Zookeeper API but back it with an etcd cluster
Apache License 2.0
1.09k stars 94 forks source link

jepsen #12

Closed glycerine closed 7 years ago

glycerine commented 7 years ago

Kyle Kingsbury/Aphyr did Jepsen testing of zookeeper. That could probably be a good baseline for testing the zetcd functionality.

https://aphyr.com/posts/291-jepsen-zookeeper

https://github.com/jepsen-io/jepsen/blob/master/zookeeper/src/jepsen/zookeeper.clj

xiang90 commented 7 years ago

@glycerine Ah. Interesting. But I think jespen will try to kill zk process. We need to modify it to kill both etcd and zk proxy. right?

glycerine commented 7 years ago

I haven't played with jepsen myself, so I'm not really sure. It would make sense though. There is certainly network partitioning simulated.

But it does point out an assumption I was making that seems worth discussing.

I assumed that zetcd just loaded etcd as a library; thus my mistake when first testing. It's pretty inconvenient to run 2 process when you only need one anyway. What's the thinking behind doing a separate proxy process? Seems simpler to have zetcd embed etcd, or provide a flag that makes etcd present a zookeeper port; either way only one type of process need run in triplicate.

heyitsanthony commented 7 years ago

@glycerine there are several reasons for having a separate process:

Simpler is not necessarily better in this case. There may be support for an embedded version in the future (this will depend on a pass-through client; there is an open issue on the etcd tracker for that), but for the time being it's not a priority for the reasons outlined above.

Having written an entire dissertation on model checking (you may have noticed zetcd supports a rudimentary form of cross-checking, which is a very powerful validation technique), I am very critical of jepsen's exaggerated claims and technology in general. If zetcd is going to have simulated network partition tests, it'll probably be done through the etcd functional-tester instead of going down the clojure rabbit hole.

glycerine commented 7 years ago

The fault isolation, e.g. due to the difference in maturity of the two projects, makes sense to me.

I think you capture the spirit of the idea (leverage existing zookeeper tests/clients) in #14, so I'll close this out.