Open mameshini opened 5 years ago
Not to muddy the waters, but I think we need a better strategy for (default) storage.
I have tested ElasticSearch/Kibana and Prometheus/Grafana with the default storage on hybrid environments, which is local, ephemeral storage. That said, we probably don't want our customers running any of the above on ephemeral storage except maybe Istio and maybe Jenkins.
Because there might be no viable "default" storage on an on-prem cluster... and there could be several choices, I think we need to support a switch.
In the short term, I would say that this component is "supported" for on-prem when we can provide an optional storage-class parameter, and it will work correctly against it.
Ceph is on top of the list because we can use it to implement default/reference implementation of block storage for a cluster. If a customer has a better implementation for block storage, that's great - we need to support a switch from our default storage to custom component provided block storage. Without block storage it's hard to implement many other components. Can we run Elastic on Ceph block storage? How about Postgres?
We can run all of these things on local mounted disks. E.g. Local Volumes by binding a PV to a physical disk/directory. https://kubernetes.io/blog/2019/04/04/kubernetes-1.14-local-persistent-volumes-ga/
I think we should take a solid look at OpenEBS and/or PortWorx.. so we get the EBS style behavior that everyone expects. Ceph block-devices give you the EBS behavior as well, but at higher complexity and varying performance characteristics.
tl;dr; I think we should support Ceph, but I don't think we should make all other components rely on ceph for storage.
Our goal is to create a reference implementation of block storage with Ceph for initial cluster deployment. Local PVs should only be used for workloads that handle data replication and backup at the application layer. We already have Ceph working on AWS. If we run into problems with Ceph we can consider OpenEBS or Portworx. However a customer may also have their own implementation of block storage that we can use. We need to discover scalability and performance limits of on-prem Ceph so we know what to recommend.
I agree that we need Ceph, and it should be part of our Metal reference deployment.
For ease of use, I guess we can point EFK to use a Ceph Block Device. But I wouldn't call it a "reference implementation"
For production: EFK should use a local volume (or several local volumes), so we need some basic support for local volumes on any component that has high random IO requirements.
(really, it would be a good way to stress-test Ceph's block device :) )
TLS support for on-prem is being tracked in agilestacks/metal-manager#29 and implemented as an on-prem component in https://github.com/agilestacks/tls-host-controller
@rrichardson @rstreics @oginskis care to update the list to mark complete implementations?
Updated
@rrichardson please update or close this issue
On-prem support for stack components: