Open Bobgy opened 4 years ago
Issue-Label Bot is automatically applying the labels:
Label | Probability |
---|---|
area/kfctl | 0.52 |
kind/question | 0.74 |
platform/gcp | 0.89 |
Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.
/cc @jlewi @8bitmp3 What do you think?
Overview or FAQ?
đź’Żđź‘Ť!
@Bobgy
What is Anthos Service Mesh?
I can share some of my writing (from May 2020) that we can shorten and include in the @Bobgy @jlewi 's proposed KF 1.1 on GCP Overview/FAQ. I put this together earlier this year to figure out how these GCP parts are related to each other:
"Anthos Service Mesh provides operational control and insights over a service mesh—a network of microservices that make up the applications and the interactions between them. It enables users to get a uniform observability into the workloads, so that they can make informed decisions on routing traffic, security and encryption policy enforcement, and other rule configurations."
"Anthos Service Mesh is a network for services that manages interactions across all services. It uses a distribution of Istio—an open-source implementation of the service mesh infrastructure layer.
... "Anthos is a broad product suite that helps bridge the worlds of on-prem and cloud-based infrastructure."
"Anthos helps you move applications to the cloud-native world with improved workflow management and reduced operational complexity. You can move workloads from on-prem to the cloud and manage the infrastructure with a consistent set of policies and tools.
@Bobgy
Can we replace it with istio? How much does it cost? (I couldn't find related documentation)
Some of this should also help—especially the Istio on GKE and Anthose Service Mesh distinction:
"For each of Google Cloud's fully-managed solutions, such as Google Kubernetes Engine (GKE) and Istio on GKE, Anthos has equivalent platforms for the on-prem, multi-cloud, and hybrid cloud world:
Fully-managed by Google Cloud For on-prem, multi-cloud, hybrid cloud Kubernetes Engine (GKE) Anthos GKE (On-Prem, for AWS/Azure) Istio on GKE Anthos Service Mesh (on GKE) "Anthos GKE and Anthos Service Mesh are built on open-source Kubernetes and Istio."
"These technologies provide solutions for modern infrastructure and application development challenges by:
"- Decoupling of applications for modularity > with Kubernetes and Istio." "- Providing scalable configuration management > with Istio."
This may be helpful in the Istio docs on KF:
"Starting from v1.5 as of Q1 2020, the Istio components in the Control Plane—Pilot, Galley, and Citadel—were consolidated into
istiod
.
I think it's a very valid question as to why with GCP there appears to be a slow tie in with Anthos with these GCP blueprints. It would make sense to offer Istio as an alternative if possible as there are costs I understand with the Anthos route and many users may not want to pay for this.
@jlewi Do you have context why we chose Anthos Service Mesh as builtin support for Kubeflow?
My understanding is an open source istio 1.4 should be functioning the same if users prefer istio.
For cloud config connector and management cluster, my answer is, that's the Google Cloud opinionated way of declarative cloud resources management. You can use yaml files that can be processed by kustomize to define GCP resources.
Things like providing built-in workload identity bindings as manifest are powered by this new ability.
EDIT: also, providing yamls that can be processed by kustomize means we are better supporting day 2 operations, e.g. you might want to customize a GCP resource created by Kubeflow. Now you can create a kustomize overlay in your instance
folder to adjust that resource. And the next day, you may want to upgrade to KF 1.1.1, that will be as simple as pulling in upstream manifest for 1.1.1, your customizations are still in your instance folder.
I too don't understand the requirement for adding in the Anthos Service Mesh as an abstraction layer over istio -- when the user is looking to deploy Kubeflow solely on GCP (not multi-cloud or on-prem).
@8bitmp3
I can share some of my writing ...
Yes, that explains what the Anthos Service Mesh is but it doesn't explain:
Particularly # 2, as that decision can have real cost implications for users.
Some of this should also help...
Again, sure, there are many product offerings which can replace open sourced solutions. Ie, for example, Argo can be replaced by another vendor workflow engine.
Another concern would be incurred cost by running the Anthos Service Mesh. As I understand it, it's not inexpensive in comparison to some other GCP services.
I agree with @connorlwilkes.
@Bobgy
my answer is, that's the Google Cloud opinionated way of declarative cloud resources management. You can use yaml files that can be processed by kustomize to define GCP resources.
I'd say Kustomize is increasing in popularity across Kubernetes deployments (vs Helm).
I guess some concerns here are:
Keep in mind Anthos isn't the cheapest*. Not all use may want to include Anthos in their deployment, and may be offput by the effort required to reintegrate open source istio.
For clarification, Anthos Service Mesh is literally managed istio. It's not an abstraction on top. You are not tied to use ASM.
Therefore, you can replace it with istio, if you got that working, we'd welcome contribution to provide istio as an option (or default?).
And this is currently an early phase of the release, many docs are catching up. The purpose of this issue is to figure out answers to these question and make it clear in documentation too.
On GCP we use ACM because it is the recommended way of running ISTIO on GCP. If you want to deploy and run OSS ISTiO instead you are welcome to do so.
Hi all! I've updated with quick answers I drafted in the first comment: https://github.com/kubeflow/gcp-blueprints/issues/123#issue-678249292.
Feel free to let me know if there are further questions.
I would put OSS ISTIO into the bring your own infrastructure bucket.
/priority p1
I've been getting quite some questions from different channels asking about various questions.
I think current documentation only explained how to deploy Kubeflow 1.1, but it doesn't touch topics below:
These are pretty much all I have in mind right now, there's probably more
UPDATE 8.24
I edited this and added quick answers below.
Where is kfctl?
kfctl is deprecated for Google Cloud. The decision is specific to Google Cloud, other platforms may continue to use kfctl.
Why did we stop using kfctl?
There were multiple reasons:
kpt cfg
andkpt pkg
respectively.Therefore, we are removing the extra layer of abstraction in kfctl and providing a simple Makefile (that is supposed to be easier to understand and customize) which leverages generic tools (kustomize, kpt and Cloud Config Connector) to deploy Kubeflow.
What is Anthos Service Mesh? Can we replace it with istio? How much does it cost?
Anthos Service Mesh is managed istio on anthos. It doesn't add extra abstractions, you can still use the CRDs in open source istio with Anthos Service Mesh and there are more observability..etc features built in with Google Cloud. Therefore, you should be able to swap it for istio 1.4 if you prefer avoiding it (maybe because of extra cost). I don't have an answer to how much it costs yet, it might require an Anthos subscription. Recommend asking Google Cloud sales about it. Welcome contribution if anyone got it working with OSS istio 1.4.
What is cloud config connector/management cluster? Why do we use it?
Cloud config connector is introduced in https://cloud.google.com/config-connector/docs/overview.
So, basically Config Connector makes it possible to manage Kubernetes resources using yaml files in Kubernetes CRDs. The Kubeflow 1.1 default setup is to installing Config Connector into a lightweight management cluster (which only contains a single node with 4 CPUs and 15GB memory). You can choose to delete the management cluste or scale it down to save costs after Kubeflow is deployed.
Before KF 1.1, GCP was using https://cloud.google.com/deployment-manager (DM) for Google Cloud resources, but some problems of it were solved by Config Connector:
In summary, our vision for switching to Cloud Config Connector is that it empowers a unified workflow using kustomize and kpt for both Google Cloud resources and Kubernetes resources that Kubeflow relies on. The workflow now supports day 2 operations (customize + upgrade at the same time).
How to troubleshoot Cloud Config Connector?
You can use kubectl to query resource status, they will have detailed error messages. e.g.
How to customize Google Cloud resources?
You can use kustomize to add customizations in your
./kubeflow/instance/gcp_config
folder. You can find the following content in./kubeflow/instance/gcp_config/kustomization.yaml
:It means the kustomization.yaml includes resources defined in files in that relative folder. So you can go to that folder
./kubeflow/upstream/manifests/gcp/v2/cnrm
to take a look at what the base template looks like.e.g. you may add a patches using
patchesStrategicMerge
and write partial yaml files that only contain fields you want to change. kustomize documentation: https://kustomize.io/ Cloud Config Connector resource spec documentation: https://cloud.google.com/config-connector/docs/how-to/creating-resource-references You can find all specs in https://cloud.google.com/config-connector/docs/reference/resources