Closed xiang90 closed 7 years ago
@xiang90 We have a "2 week maximum" use requirement per cluster request. We've updated the online form to highlight the min/max, so apologies that it wasn't visible when you submitted your request. I see in your request that it states "open ended". We can allocate the 80 nodes you requested, but as I mentioned above, the allocation would only last 2 weeks. Will that be sufficient for you to complete your work? /cc @cncf/intel-cluster-team
@cncfclusterteam
Thanks for the reply. Let me provide more background of what we hope to do with CNCF cluster.
Now etcd project is running around 15-20 machines on GCE for our reliability testing 7 * 24 (http://dash.etcd.io/). We keep on injecting different kinds of software defined failures into these node to exercise etcd. This kind of testing has been extremely valuable for making etcd reliable. We hope we can access some physical machines and improve our testing infrastructure.
For similar project like chubby@google, they spent > 100s of machine years to ensure chubby work well under failures. And they clearly stated that this is probably the best way to test out one of the most critical infrastructures.
For etcd, we have the same goal to make it a super reliable piece of software for the community and for the 100s of open source applications that depend on it. So we would want to have a public accessible and aware way to continuously test out it. It would greatly benefit the open source community we believe. It can also encourage other people to test their software extensively.
It is why we hope to use the machines in an open ended way.
If the 2 weeks is a hard limitation, we might consider to use the machines for other kinds of testing. But we really hope we can do failure injecting testing on CNCF clusters.
@xiang90 The 2 week maximum is a hard limitation. We expect the number of requests to increase substantially and need to ensure that the full capacity is available to tenants during that cycle.
@xiang90 we're happy to supply this capacity to you guys on our Packet platform.
@xiang90 @philips I have been speaking with @zsmith928 at Packet about them generously making available additional resources that could be used in different ways than the existing community cluster. Could you guys please follow up offline, and we'd love to talk publicly about whatever you come up with.
@zsmith928 @dankohn Great new! It would be really helpful for us. Can you drop me an email at xiang.li@coreos to start the discussion? Thanks a lot!
@cncfclusterteam Is the longterm plan with the cluster that it is used for large scale tests instead of sustained correctness or performance testing? This should be called out in the README.
Thanks @zsmith928
@xiang90 will ping you offline to coordinate. thanks @dankohn!
Following up on the status of this issue, please link provide a link to the followups that happen in other places.
I can't find any existing project within Equinix for Etcd. As this has been ~5yrs in the past, could you submit a new issue outlining the request?
Once it's +1'd I can grant access pretty quickly.
Thanks!
If you are interested in filing a request for access to the CNCF Community Cluster, please fill out the details below.
If you are just filing an issue, ignore/delete those fields and file your issue.
First Name
Xiang
Last Name
Li
Email
xiang.li@coreos.com
Company/Organization
CoreOS Inc.
Job Title
Project Title
etcd
What existing problem or community challenge does this work address? ( Please include any past experience or lessons learned )
Distributed reliable key-value store for the most critical data of a distributed system
Briefly describe the project
Distributed reliable key-value store for the most critical data of a distributed system
Do you intend to measure specific metrics during the work? Please describe briefly
Improve reliability, scalability of etcd
Which members of the CNCF community and/or end-users would benefit from your work?
Kubernetes and other distributed system in general.
Is the code that you’re going be running 100% open source? If so, what is the URL or URLs where it is located?
Yes. https://github.com/coreos/etcd
Do you commit to publishing your results and upstreaming the open source code resulting from your work? Do you agree to this within 2 months of cluster use?
Yes.
Will your testing involve containers? If not, could it? What would be entailed in changing your processes to containerize your workload?
Yes.
Are there identified risks which would prevent you from achieving significant results in the project ?
No.
Have you requested CNCF cluster resources or access in the past? If ‘no’, please skip the next three questions.
No
Please list project titles associated with prior CNCF cluster usage.
Please list contributions to open source initiatives for projects listed in the last question. If you did not upstream the results of the open source initiative in any of the projects, please explain why.
Have you ever been denied usage of the cluster in the past? If so, please explain why.
Please state your contributions to the open source community and any other relevant initiatives
Maintainer of etcd, Kubernetes.
Number of nodes requested (minimum 20 nodes, maximum 500 nodes). In Q3, maximum increases to 1000 nodes.
80
Duration of request (minimum 24 hours)
open ended
With or Without an operating system (Restricted to CNCF pre-defined OS and versions)?
How will this testing advance cloud native computing (specifically containerization, orchestration, microservices or some combination).
etcd is the critical part of distributed system, including orchestration and microservices.
Making etcd more reliable and scalable improves other applications that rely on it.
Any other relevant details we should know about while preparing the infrastructure?