GoogleCloudPlatform / cloud-code-vscode

Cloud Code for Visual Studio Code: Issues, Documentation and more
Other
420 stars 112 forks source link

working kubeconfig doesn't load at all #812

Open jmhodges opened 1 year ago

jmhodges commented 1 year ago

Type: Bug

Attempting to click on the kubernetes cluster config (that works in the Kubernetes extension next door and from my normal shells) just gets the error:

The cluster request failed for an unknown reason: unable to verify the first certificate

Turning on "cloudcode.verboseLogging" and setting "cloudcode.cloudSdkVerbosityLevel" to "debug" adds nothing to the the Cloud Code logs.

The last lines are all things like

[6/23/2023, 2:52:30 AM] 'gcloud auth list --format value(account) --filter status=ACTIVE --quiet --verbosity debug' exited with code 0 took 363ms (unmanaged)

Running the command in that log line in a normal shell (with quotation marks around value(account) because zsh) correctly returns what's in the logs: my GCP account's email address

Extension version: 1.21.7 VS Code version: Code 1.79.2 (Universal) (695af097c7bd098fbf017ce3ac85e09bbc5dda06, 2023-06-14T08:58:52.392Z) OS version: Darwin arm64 22.5.0 Modes:

System Info |Item|Value| |---|---| |CPUs|Apple M2 Pro (12 x 24)| |GPU Status|2d_canvas: enabled
canvas_oop_rasterization: disabled_off
direct_rendering_display_compositor: disabled_off_ok
gpu_compositing: enabled
metal: disabled_off
multiple_raster_threads: enabled_on
opengl: enabled_on
rasterization: enabled
raw_draw: disabled_off_ok
video_decode: enabled
video_encode: enabled
vulkan: disabled_off
webgl: enabled
webgl2: enabled
webgpu: enabled| |Load (avg)|4, 5, 4| |Memory (System)|32.00GB (1.38GB free)| |Process Argv|--crash-reporter-id 3bd440e4-2b9d-41e6-bd28-a5b7aa27a910| |Screen Reader|no| |VM|0%|
A/B Experiments ``` vsliv368cf:30146710 vsreu685:30147344 python383cf:30185419 vspor879:30202332 vspor708:30202333 vspor363:30204092 vswsl492cf:30256860 vslsvsres303:30308271 vserr242cf:30382550 pythontb:30283811 vsjup518:30340749 pythonptprofiler:30281270 vshan820:30294714 vstes263:30335439 vscod805cf:30301675 binariesv615:30325510 bridge0708:30335490 bridge0723:30353136 vsaa593cf:30376535 pythonvs932:30410667 py29gd2263cf:30773604 vscaat:30438848 vsclangdc:30486549 c4g48928:30535728 dsvsc012cf:30540253 pynewext54:30695312 azure-dev_surveyone:30548225 vsccc:30610678 2e4cg342:30602488 pyind779:30671433 89544117:30613380 pythonsymbol12:30671437 2i9eh265:30646982 showlangstatbar:30737416 vsctsb:30748421 pythonms35:30701012 03d35959:30757346 pythonfmttext:30731395 pythoncmv:30756943 fixshowwlkth:30771522 pythongtdpath:30769146 dh2dc718:30770000 pythonidxptcf:30772540 pythondjangotscf:30772537 ```
jmhodges commented 1 year ago

(I've also tried this with Cloud SDK version 436.0.0)

SKrupa commented 1 year ago

Could you try some of the potential mitigations listed here https://github.com/GoogleCloudPlatform/cloud-code-vscode/issues/791

If that doesn't work, unfortunately it seems there is not much we can do from the extension at this time, and you may need to use the CLI for your use case.

jmhodges commented 1 year ago

Which ones? There's a lot of back and forth in there and I'm not sure which you're talking about. I don't have NODE_EXTRA_CA_CERTS set in any of my environments (neither normal shells nor vscode).

SKrupa commented 1 year ago

The main thing to try would be the steps outlined in https://github.com/GoogleCloudPlatform/cloud-code-vscode/issues/791#issuecomment-1584798755

A couple other things you can try if that doesn't work: toggle "cloudcode.autoDependencies": "on"/"off Ensure $KUBECONFIG points to the correct config

jmhodges commented 1 year ago

The autoDependencies being on and off didn't change anything (including doing a full wipe of /Users/jmhodges/Library/Application\ Support/cloud-code/ and restarting vscode).

$KUBECONFIG is pointed at the right place (I changed the names in ~/.kube/config, reloaded the window, and saw the new names in the menu)

Turning off the TLS verification for all of my tools by editing a kube config instead of just this one feels very bad.

Turning off TLS verification also doesn't seem to be required for the extension given that the Kubernetes extension (ms-kubernetes-tools.vscode-kubernetes-tools) seems to handle it just fine? I'm a little confused about what's different between the two.

(Their readme, for instance, says to only do the same cluster-wide thing if needed, but the kubernetes extension works fine with the same kube config Cloud Code has. And I know the kubernetes extension is using the same kube config because it saw the same renames I did above to test what kube config Cloud Code was looking at)

jmhodges commented 1 year ago

So the deal seems to be that the k8s extension uses kubectl from the command-line while, I suspect, Cloud Code is using node's own HTTP client? (I don't believe the code is open source, though?)

Could there be a compromise here? The node https library allows for setting a custom CA. Could Cloud Code use kubectl to pick up the CA certificate it needs and use that?

(I want to reiterate here I'm not using any fancy stuff in GKE. It's a plain cluster I've had for years.)

jmhodges commented 1 year ago

Here's a kubectl command that lists out on each line the cluster name and its base64-encoded CA certificate:

kubectl config view --raw -o go-template='{{ range $i, $rawCluster := .clusters }}{{ with $cluster := index $rawCluster "cluster" }}{{- index $rawCluster "name" }} {{ index $cluster "certificate-authority-data" | printf "%s\n"}}{{ end }}{{ end }}' 

I believe it would be a matter of parsing this stdout, mapping the cluster name to those values (while base64 decoding in node), and either listing them all on each https.request's ca parameter or remembering which cluster we're talking in order to set the ca option on https.request (docs) which supports the tls.connect parameters. (I think https.Agent has similar options, too. The nodejs docs are a lil funny and point out to their tls.connect docs instead of inlining the options, so it's hard to find the right doc to link to)

jmhodges commented 1 year ago

Hm, I suppose y'all are already parsing the kubeconfig in node already though! So, if y'all are, it seems like the ca parameters on https.request gets you where you need to go?

SKrupa commented 1 year ago

Thanks for helping investigate this!

Yep, we use the node package instead of kubectl for this, which causes the discrepancy between the extensions. From what I can tell, we are setting the ca parameters on the request as per the kubeconfig value, but it might be getting lost somewhere along the way. We'll take a look at parsing the kubectl certificate and setting that on the response to see if it improves anything.

jeremiah-snee-openx commented 1 year ago

@SKrupa Also having this same issue. Is there any update on a fix?

j-windsor commented 1 year ago

Ok I have another idea of why this is happening looking more deeply into it. In @kubernetes/client-node (which we use to pull data from the cluster), The value of certificate-authority-data, is set to https agent ca parameter. But, vscode overwrites the value of agent for all requests if the "Http: Proxy Support" setting is the default value "override".

Can you see if setting "Http: Proxy Support" to "off" gets things working for you?

Screenshot 2023-07-20 at 3 29 10 PM
jeremiah-snee-openx commented 1 year ago

@j-windsor Setting this to fallback worked for me.

ktarplee commented 8 months ago

Http: Proxy Support

No longer seems to exist in VS Code. Is there another setting we need to set?

ktarplee commented 8 months ago

I have noticed that when the certificate-authority-data refers to a self-signed (untrusted) certificate it works fine (kind clusters do this). When it instead refers to a certificate issued by a different self-signed (untrusted) certificate it fails (RKE2 does this). In other words if the certificate chain is greater than length one it fails if the certificate at the top is self-signed and untrusted. This is obviously a bug since there is no certificate chain validation necessary if the leaf certificate is trusted (which is what it means to be in the certificate-authority-data field).

sripasg commented 8 months ago

@ktarplee - Can you please provide the details about the platform you are using and the extension version ?

Also I will assume that you are having the exact same issue since you re-opened it -- in that: kubectl is able to work with the cluster but the extension alone doesn't load it with the exact same error The cluster request failed for an unknown reason: unable to verify the first certificate

If it's not - can you please provide more detail about the exact error in your specific case ?

ktarplee commented 8 months ago

Yes, in my case kubectl work fine for all my clusters. Cloud code seems to only work for KinD clusters. When I use cloud code on a RKE2 cluster (v1.29.2) it fails with the error

request to https://mycluster.com:6443/api/v1/nodes failed, reason: self-signed certificate in certificate chain.

The RKE2 clusters uses a length two certificate chain for the k8s apiserver cert while kind clusters use a length one chain. Both use untrusted (by my system) self-signed certs as the root.

I have seen this same problem on Linux and Mac OS X with the latest version of the extension and VS Code.

maxrandolph commented 7 months ago

Thanks for the additional info @ktarplee.

Would you mind repro-ing this in your IDE and then in the same session filing a feedback report under the extension tab?