Cluster access for etcd

xiang90 commented 8 years ago

If you are interested in filing a request for access to the CNCF Community Cluster, please fill out the details below.

If you are just filing an issue, ignore/delete those fields and file your issue.

First Name

Xiang

Last Name

Li

Email

xiang.li@coreos.com

Company/Organization

CoreOS Inc.

Job Title

Project Title

etcd

What existing problem or community challenge does this work address? ( Please include any past experience or lessons learned )

Distributed reliable key-value store for the most critical data of a distributed system

Briefly describe the project

Distributed reliable key-value store for the most critical data of a distributed system

Do you intend to measure specific metrics during the work? Please describe briefly

Improve reliability, scalability of etcd

Which members of the CNCF community and/or end-users would benefit from your work?

Kubernetes and other distributed system in general.

Is the code that you’re going be running 100% open source? If so, what is the URL or URLs where it is located?

Yes. https://github.com/coreos/etcd

Do you commit to publishing your results and upstreaming the open source code resulting from your work? Do you agree to this within 2 months of cluster use?

Yes.

Will your testing involve containers? If not, could it? What would be entailed in changing your processes to containerize your workload?

Yes.

Are there identified risks which would prevent you from achieving significant results in the project ?

No.

Have you requested CNCF cluster resources or access in the past? If ‘no’, please skip the next three questions.

No

Please list project titles associated with prior CNCF cluster usage.

Please list contributions to open source initiatives for projects listed in the last question. If you did not upstream the results of the open source initiative in any of the projects, please explain why.

Have you ever been denied usage of the cluster in the past? If so, please explain why.

Please state your contributions to the open source community and any other relevant initiatives

Maintainer of etcd, Kubernetes.

Number of nodes requested (minimum 20 nodes, maximum 500 nodes). In Q3, maximum increases to 1000 nodes.

80

Duration of request (minimum 24 hours)

open ended

With or Without an operating system (Restricted to CNCF pre-defined OS and versions)?

How will this testing advance cloud native computing (specifically containerization, orchestration, microservices or some combination).

etcd is the critical part of distributed system, including orchestration and microservices.

Making etcd more reliable and scalable improves other applications that rely on it.

Any other relevant details we should know about while preparing the infrastructure?

cncfclusterteam commented 8 years ago

@xiang90 We have a "2 week maximum" use requirement per cluster request. We've updated the online form to highlight the min/max, so apologies that it wasn't visible when you submitted your request. I see in your request that it states "open ended". We can allocate the 80 nodes you requested, but as I mentioned above, the allocation would only last 2 weeks. Will that be sufficient for you to complete your work? /cc @cncf/intel-cluster-team

xiang90 commented 8 years ago

@cncfclusterteam

Thanks for the reply. Let me provide more background of what we hope to do with CNCF cluster.

Now etcd project is running around 15-20 machines on GCE for our reliability testing 7 * 24 (http://dash.etcd.io/). We keep on injecting different kinds of software defined failures into these node to exercise etcd. This kind of testing has been extremely valuable for making etcd reliable. We hope we can access some physical machines and improve our testing infrastructure.

For similar project like chubby@google, they spent > 100s of machine years to ensure chubby work well under failures. And they clearly stated that this is probably the best way to test out one of the most critical infrastructures.

For etcd, we have the same goal to make it a super reliable piece of software for the community and for the 100s of open source applications that depend on it. So we would want to have a public accessible and aware way to continuously test out it. It would greatly benefit the open source community we believe. It can also encourage other people to test their software extensively.

It is why we hope to use the machines in an open ended way.

If the 2 weeks is a hard limitation, we might consider to use the machines for other kinds of testing. But we really hope we can do failure injecting testing on CNCF clusters.

cncfclusterteam commented 8 years ago

@xiang90 The 2 week maximum is a hard limitation. We expect the number of requests to increase substantially and need to ensure that the full capacity is available to tenants during that cycle.

zsmithnyc commented 8 years ago

@xiang90 we're happy to supply this capacity to you guys on our Packet platform.

dankohn commented 8 years ago

@xiang90 @philips I have been speaking with @zsmith928 at Packet about them generously making available additional resources that could be used in different ways than the existing community cluster. Could you guys please follow up offline, and we'd love to talk publicly about whatever you come up with.

xiang90 commented 8 years ago

@zsmith928 @dankohn Great new! It would be really helpful for us. Can you drop me an email at xiang.li@coreos to start the discussion? Thanks a lot!

philips commented 8 years ago

@cncfclusterteam Is the longterm plan with the cluster that it is used for large scale tests instead of sustained correctness or performance testing? This should be called out in the README.

Thanks @zsmith928

zsmithnyc commented 8 years ago

@xiang90 will ping you offline to coordinate. thanks @dankohn!

serathius commented 2 years ago

Following up on the status of this issue, please link provide a link to the followups that happen in other places.

jeefy commented 2 years ago

I can't find any existing project within Equinix for Etcd. As this has been ~5yrs in the past, could you submit a new issue outlining the request?

Once it's +1'd I can grant access pretty quickly.

Thanks!

cncf / cluster