2i2c-org / infrastructure

Infrastructure for configuring and deploying our community JupyterHubs.
https://infrastructure.2i2c.org
BSD 3-Clause "New" or "Revised" License
105 stars 64 forks source link

AzureFile NFS network settings may block terraform access once applied #890

Open sgibson91 opened 2 years ago

sgibson91 commented 2 years ago

Description

https://github.com/sgibson91/pilot-hubs/pull/94 partnered with https://github.com/2i2c-org/infrastructure/pull/887 represents an effort to get NFS working on AzureFile storage and involved making some network changes in terraform so that the NFS share could be accessed and mounted by the k8s nodes.

While working on the Carbon Plan Azure cluster, I applied this new terraform config and then ran another terraform plan command, mostly to confirm to myself that the infrastructure was up-to-date, however I ran into this error message:

│ Error: shares.Client#GetProperties: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailure" Message="This request is not authorized to perform this operation.\nRequestId:8150832e-d01a-0012-63ad-edeedb000000\nTime:2021-12-10T10:02:58.0047296Z"
│ 
│   with azurerm_storage_share.homes,
│   on storage.tf line 21, in resource "azurerm_storage_share" "homes":
│   21: resource "azurerm_storage_share" "homes" {

I am now worried that by making the NFS accessible to k8s, we have locked ourselves out from managing the infrastructure via terraform.

Value / benefit

We need to retain access via terraform to sustainably manage infrastructure.

Implementation details

No response

Tasks to complete

No response

Updates

No response

yuvipanda commented 2 years ago

This is probably https://github.com/hashicorp/terraform-provider-azurerm/issues/2977. Looks like https://github.com/hashicorp/terraform-provider-azurerm/pull/14220 is supposed to fix it.

In the meantime, can we 'ignore' that particular change somehow in terraform so we can move forward with other changes?

sgibson91 commented 2 years ago

In the meantime, can we 'ignore' that particular change somehow in terraform so we can move forward with other changes?

This is probably my fault for excessively trimming the error message. This error crops up during the "refreshing state" phase of terraform plan, it hasn't even got the the point of calculating the change yet, because it can't check the current state of the file share. So there's nothing to 'ignore'.

sgibson91 commented 2 years ago

Full error message:

$ tf plan -var-file=projects/carbonplan.tfvars -out=carbonplan -refresh-only azurerm_resource_group.jupyterhub: Refreshing state... [id=/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourceGroups/2i2c-carbonplan-cluster] azurerm_virtual_network.jupyterhub: Refreshing state... [id=/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourceGroups/2i2c-carbonplan-cluster/providers/Microsoft.Network/virtualNetworks/k8s-network] azurerm_container_registry.container_registry: Refreshing state... [id=/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourceGroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerRegistry/registries/2i2ccarbonplanhubregistry] azurerm_subnet.node_subnet: Refreshing state... [id=/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourceGroups/2i2c-carbonplan-cluster/providers/Microsoft.Network/virtualNetworks/k8s-network/subnets/k8s-nodes-subnet] azurerm_storage_account.homes: Refreshing state... [id=/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourceGroups/2i2c-carbonplan-cluster/providers/Microsoft.Storage/storageAccounts/2i2ccarbonplanhubstorage] azurerm_kubernetes_cluster.jupyterhub: Refreshing state... [id=/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourcegroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerService/managedClusters/hub-cluster] azurerm_storage_share.homes: Refreshing state... [id=https://2i2ccarbonplanhubstorage.file.core.windows.net/homes] azurerm_kubernetes_cluster_node_pool.user_pool["small"]: Refreshing state... [id=/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourcegroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerService/managedClusters/hub-cluster/agentPools/nbsmall] azurerm_kubernetes_cluster_node_pool.dask_pool["small"]: Refreshing state... [id=/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourcegroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerService/managedClusters/hub-cluster/agentPools/dasksmall] azurerm_kubernetes_cluster_node_pool.dask_pool["huge"]: Refreshing state... [id=/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourcegroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerService/managedClusters/hub-cluster/agentPools/daskhuge] azurerm_kubernetes_cluster_node_pool.user_pool["vhuge"]: Refreshing state... [id=/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourcegroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerService/managedClusters/hub-cluster/agentPools/nbvhuge] azurerm_kubernetes_cluster_node_pool.dask_pool["vvhuge"]: Refreshing state... [id=/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourcegroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerService/managedClusters/hub-cluster/agentPools/daskvvhuge] azurerm_kubernetes_cluster_node_pool.dask_pool["medium"]: Refreshing state... [id=/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourcegroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerService/managedClusters/hub-cluster/agentPools/daskmedium] azurerm_kubernetes_cluster_node_pool.user_pool["large"]: Refreshing state... [id=/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourcegroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerService/managedClusters/hub-cluster/agentPools/nblarge] azurerm_kubernetes_cluster_node_pool.dask_pool["large"]: Refreshing state... [id=/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourcegroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerService/managedClusters/hub-cluster/agentPools/dasklarge] azurerm_kubernetes_cluster_node_pool.dask_pool["vhuge"]: Refreshing state... [id=/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourcegroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerService/managedClusters/hub-cluster/agentPools/daskvhuge] azurerm_kubernetes_cluster_node_pool.user_pool["vvhuge"]: Refreshing state... [id=/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourcegroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerService/managedClusters/hub-cluster/agentPools/nbvvhuge] azurerm_kubernetes_cluster_node_pool.user_pool["medium"]: Refreshing state... [id=/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourcegroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerService/managedClusters/hub-cluster/agentPools/nbmedium] azurerm_kubernetes_cluster_node_pool.user_pool["huge"]: Refreshing state... [id=/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourcegroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerService/managedClusters/hub-cluster/agentPools/nbhuge] kubernetes_namespace.homes: Refreshing state... [id=azure-file] kubernetes_secret.homes: Refreshing state... [id=azure-file/access-credentials] ╷ │ Error: shares.Client#GetProperties: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailure" Message="This request is not authorized to perform this operation.\nRequestId:a7c9006d-d01a-004f-5e8d-0be45f000000\nTime:2022-01-17T10:34:16.3962556Z" │ │ with azurerm_storage_share.homes, │ on storage.tf line 21, in resource "azurerm_storage_share" "homes": │ 21: resource "azurerm_storage_share" "homes" {

GeorgianaElena commented 2 years ago

I just ran terraform plan on the toronto cluster and I can confirm the same behavior :(

yuvipanda commented 2 years ago

Ah, while terraform doesn't support excluding certain resources from runs (https://github.com/hashicorp/terraform/issues/2253), you can pass -target to apply to only look at specific resources. Temporarily as a way to unblock us, we can use that to explicitly list the cluster related resources so we can ignore the AzureFile. https://github.com/hashicorp/terraform-provider-azurerm/pull/14220 is the 'real' fix, but we needn't wait for that...

sgibson91 commented 2 years ago

I can confirm that the following command worked (at least to give me access again, haven't attempted to make a change yet!)

$ tf plan -var-file=projects/carbonplan.tfvars -out=carbonplan -refresh-only -target=azurerm_kubernetes_cluster.jupyterhub -target=azurerm_kubernetes_cluster_node_pool.user_pool -target=azurerm_kubernetes_cluster_node_pool.dask_pool

yuvipanda commented 1 year ago

Note that this is still a problem, and the cause is https://github.com/hashicorp/terraform-provider-azurerm/issues/2977

consideRatio commented 10 months ago

I've not yet understood the details here, but I did a terraform plan and ran into a 403 permissions error when terraform were inspecting the infra. Googling my way around I concluded that I could add my computers public IP to a firewall below temporarily to not run into the 403 from the UI seen below under the "Firewall" heading.

image


I've now tested and concluded that both terraform plan and terraform apply worked after adding my own IP to the firewall.

consideRatio commented 10 months ago

@yuvipanda do you think the proxycommand.py script you've created could be used for the purpose of allowing terraform to inspect things in the NFS for this as well?

Looking at a NFS mount command provided, it sais...

sudo mkdir -p /mount/2i2cutorontohubstorage/homes
sudo mount -t nfs 2i2cutorontohubstorage.file.core.windows.net:/2i2cutorontohubstorage/homes /mount/2i2cutorontohubstorage/homes -o vers=4,minorversion=1,sec=sys,nconnect=4

Do you think we can with a few commands route traffic from our local computers to 2i2cutorontohubstorage.file.core.windows.net via a pod created by the proxycommand.py script?

yuvipanda commented 10 months ago

@consideRatio oh, yeah it could probably do that! Will need to be some sort of HTTP proxy (rather than ssh one), may be a fun project to build. The current setup probably won't work because it's just for ssh, which is in some ways easier.

I think very temporarily adding your own IP and then unadding it is easier for sure :D But must remember to unadd it though.

GeorgianaElena commented 3 months ago

I think very temporarily adding your own IP and then unadding it is easier for sure :D But must remember to unadd it though.

I'm not sure what can we do to enforce this and make sure we don't forget to remove the IP. I've just added my IP to the list as part of https://github.com/2i2c-org/infrastructure/issues/890 and deleted the old entry that was still in the list.