Dynamically authorize VSI to storage resources

kaufers commented 7 years ago

Storage resource types: ibmcloud_infra_file_storage and ibmcloud_infra_block_storage VSI resource type: ibmcloud_infra_virtual_guest

We want to use terraform to provision a volume, provision a VSI, and then have the VSI mount that the volume. In order for the VSI to use the volume, it needs to know the dynamic volume hostname and volume name; this requires that the volume is provisioned first.

Currently, this is accomplished using the following flow:

Provision the volume
Provision and VSI, including the volume hostname and volumenamein the VSI's user_metadata. We have a post-provisioning script that will read the volume info from the metdata and attempt to mount it. The mounting will fail until the VSI is authorized to the specific volume.
Issue a terraform local-exec that executes a binary that uses the SoftLayer golang API to authorize the VSI to the volume.

We want to avoid step 3. The feature being requested here would be to include some attribute on the ibmcloud_infra_virtual_guest resource type where we could specify the volume ID(s). Something like:

"storage_ids": [<some-id1>, <some-id2>]

If this storage_ids attribute is defined, then terraform would attempt to authorize the VSI being created to the specific volume(s).

Note that the implementation would require that the IDs being specified are added to the list of authorized VSIs (vs. having the volume authorized to only the specified VSIs). The https://github.com/softlayer/softlayer-go/blob/master/services/network.go API handles this.

ashishth09 commented 7 years ago

@kaufers , looks like the appropriate API is http://sldn.softlayer.com/reference/services/SoftLayer_Virtual_Guest/allowAccessToNetworkStorageList.

ashishth09 commented 7 years ago

A typical user flow would be -

resource "ibmcloud_infra_file_storage" "endurance" {
....
}
resource "ibmcloud_infra_virtual_guest" "vsi" {
....
  storage_ids       = ["${ibmcloud_infra_file_storage.endurance.id}",
...
}

Now VSI authorizes itself to the storage. But when you do terraform plan . It migh tell you something like

~ ibmcloud_infra_file_storage.endurance
    allowed_virtual_guest_ids.#:        "1" => "0"
    allowed_virtual_guest_ids.32962233: "32962233" => "0"

Because you authorized the access to that storage circumventing the _allowed_virtualguests lists on the _ibmcloud_infra_filestorage resource. But that field is populated during the next plan during refresh after VSI is authorized to the storage.

The only way out of this as far as I know would be to ignore the changes to _allowed_virtual_guestids on _ibmcloud_infra_filestorage using terraform lifecycle hook

resource "ibmcloud_infra_file_storage" "endurance" {
....
  lifecycle {
    ignore_changes = ["allowed_virtual_guest_ids"]
  }
}

@kaufers , @renier

kaufers commented 7 years ago

@ashishth09 You are correct; IMO we do not want to use the allowed_virtual_guest_ids to handle this. First, it's a circular reference and, second, the user might not be using terraform to provision the storage. It seems that simply doing a SL API call to authorize the device would be better.

ashishth09 commented 7 years ago

@rbmateescu , @renier , @linsun, @kaufers

If we add this feature then someone who provisions volume and virtual guest both using terraform will necessarily need to _ignorechanges to

_allowed_virtual_guestids in case VSI authorizes to storage during virtual_guest resource provisioning by adding volumes to access in storage_ids list or
_storageids in case storage resource was used to authorize to a list of vms by adding vms to __allowed_virtual_guestids

Since change in one would reflect in another due to the reason explained above. This essentially means that they can't use a mix of both. So we would force the user to use either volume to authorize to VSIs or VSIs to authorize to volumes. I do see it as providing two options though :smiley: But then they necessarily have to use lifecycle

  lifecycle {
    ignore_changes = []
  }

in one or the other resource configuration. Please let me know what you think based on this.

kaufers commented 7 years ago

@ashishth09 I'm fine with that requirement; could that simply be doc'd with the storage_ids attribute to explain the implications of using it?

renier commented 7 years ago

@ashishth09 I don't think using lifecycle is strictly necessary. I don't think that the Read operation really needs to update allowed_virtual_guest_ids and allowed_hardware_ids. It is an optional field, but not computed, so the code should reflect that (which is not right now). That will avoid the conflict on a plan refresh.

ashishth09 commented 7 years ago

@renier, But if you make that field _allowed_virtual_guestids as computed then the changes in that will not be picked up. Isn't it so?. I guess that is what you are also pointing to.

But in that case _ibmcloud_infra_filestorage won't be able to add, delete ids from _allowed_virtual_guestids anymore meaning user of that resource is at loss.

ashishth09 commented 7 years ago

I guess if it is only computed then it is a problem if it is both optional and computed then it could work the way we want. I will get back

Thanks

ashishth09 commented 7 years ago

@renier,

You create a volume, _ allowed_virtual_guestids is not set.
You create a VSI that refers to the previous volume via _storageid- Call it VSI1
You create another VSI, no _storageid , so no volumes - Call it VSI2
You go to the volume created in the first place and then you add _allowed_virtual_guestids = [VSI2] When you refresh it tell that VSI1 will be removed and VSI2 will be added. Something like
```
~ ibmcloud_infra_file_storage.fs_endurance
allowed_virtual_guest_ids.VSI2: "" => "VSI2"
allowed_virtual_guest_ids.VSI1: "VSI" => "0"
```

So if user wants to be use the volume that got created in the first place to authorize more vms, he must remember the ids of all the vms authorized to it and put them in the _allowed_virtual_guestids in _ibmcloud_infra_filestorage to avoid any unexpected diffs.

kaufers commented 7 years ago

@ashishth09 Yes, we've noticed this as well. If you use terraform to create the volume then you cannot authorize VSIs out-of-band; if you do then the next apply clears them. We have been using ignore_changes on the allowed_virtual_guest_ids attribute to handle this.

renier commented 7 years ago

@ashishth09 No, the change will only be allowed_virtual_guest_ids.VSI2: "" => "VSI2", because allowed_virtual_guest_ids would not be refreshing from the cloud (not acting as computed). It will still override whatever was set in the API.

This is just a matter of how will you use the volume. Like you said before, you have to either choose to always create the volume first, then assign the VSIs from the virtual_guest resource. Or create the virtual_guest first, then create the volumes later. If you mix usage, you get into trouble.

Why would you mix usage in that way for your architecture? Is that a valid use case we are willing to support?

renier commented 7 years ago

@kaufers That problem would go away if the resource stops refreshing the property from the cloud. It is currently acting as a computed field, but it is not tagged as a computed field in the terraform schema. I don't see a reason why it should be computed in this case when it raises these difficulties.

ashishth09 commented 7 years ago

@renier, @kaufers , I think the following must hold together 1) Make storage_ids and allowed_virtual_guest_ids as optional + computed to avoid diffs. Computed is required for the following scenario

Create volume
Create VSI and authorize from VSI
Terraform plan correctly gives a diff on volume.
```
~ ibmcloud_infra_file_storage.endurance
allowed_virtual_guest_ids.#:        "1" => "0"
allowed_virtual_guest_ids.VSI: "VSI" => "0"
```
Also we can't avoid updating the allowed_virtual_guest_ids in Read operation even if it were only optional. This would not catch any drift that might occur due to resources being added outside terraform.

2) Use only volume or only virtual guest to authorize.

I guess it doesn't affect the current users also. Those using volumes can continue to use so or drop it altogether to use virtual guest. New user have either/or option

I don't have any problem with 1. If all are OK with 2 then we are all set

ashishth09 commented 7 years ago

Well the one problem with computed would be the deletion process. Assuming some volumes were authorized via VSI's _storageids. Now if you remove the _storageids completely from the configuration it will not detect that change since it is computed.

If it were optional it will delete the all the association in _storageids. The way out of this is to explicitly set _storageids to [] if we want to delete all the authorization association.

Let me know your thoughts. @renier, @kaufers . Thanks

renier commented 7 years ago

@ashishth09 I believe the problem you will get if you make it truly optional + computed, is that plan refresh will want to update the cloud with the values it has in its current tf file. Basically, which one will win, refreshed state (state file), or the desired state (in the tf file)?

Also we can't avoid updating the allowed_virtual_guest_ids in Read operation even if it were only optional.

I didn't understand that.

This would not catch any drift that might occur due to resources being added outside terraform.

Yes, but that would happen as well if you ignore the properties using the lifecycle construct. Also, if setting this from the virtual_guest, that should always add itself to the volume if not there already.

About storage_ids, this would be a fake property. It doesn't directly represent a value in the API. In the case of deletion (or modification of the list), the code should remove/add itself (i.e. the guest) to the appropriate storage_ids. Terraform should detect those changes and call the Update operation on the resource.

ashishth09 commented 7 years ago

Hi @renier,

A recap :smiley:

1)We can't have user use both _allowed_virtual_guestids and _storageids at the same time, because that will create diffs and are difficult to handle. 2)Now assuming user has agreed to use one consistently, our next problem is to avoid diffs in the other resource's parameter which he never uses. So we don't care about diff handling in that other resource at all. It is always computed meaning it reads from cloud and dumps in the state file which user is not concerned with. It is optional field as well but user will not touch it as he is using other resources for that job. That field never appears in his tf file.

I believe the problem you will get if you make it truly optional + computed, is that plan refresh will want to update the cloud with the values it has in its current tf file. Basically, which one will win, refreshed state (state file), or the desired state (in the tf file)?

Now since the user is using say _allowed_virtual_guestids only 1) User has not specified it at all in the tf file. So since it is computed he will not know if someone has added some vm to the volume from outside. I do see this as the problem which comes with it being computed. I am not sure how much of a problem it is though. 2)User adds some vm to _allowed_virtual_guestids for the first time, the plan will point out as you said about the new things to be added or removed. This is user's choice. Terraform has done its job in telling the user that you are about to delete some resources (those which Terraform got from read operation and are not there in the tf file) and add new those which are there in the tf. I am not sure if in real world touching resource from outside will happen often or not. Even it it does, you are being told and you take the appropriate action of adding those now values to the list provided to you from cloud read if you want them else choose to delete them.

And yes deletion of the optional + computed resource can't be done by simply removing them from the tf file, they need to be set explicitly so empty values.

renier commented 7 years ago

@ashishth09

User has not specified it at all in the tf file. So since it is computed he will not know if someone has added some vm to the volume from outside. I do see this as the problem which comes with it being computed. I am not sure how much of a problem it is though.

No, I don't think this is a problem. A storage resource created from terraform, in practice, should be configured from Terraform (or whatever infra framework they used). It being re-configured from the portal is possible, but unlikely and ultimately wrong in terms of operating procedure.

Also, you say "since it is computed", but what I've been saying is not to make it computed. It would be optional and write-only. When the user sets it, it writes to the cloud, but never refreshes from the cloud (no read). That has implications on the diff calculation.

And yes deletion of the optional + computed resource can't be done by simply removing them from the tf file, they need to be set explicitly so empty values.

If you are talking about a tf resource, then deletion has to be done by removing it from the tf file. If you are talking about editing a property of a resource, like an access list, then yes, you would edit that property in the tf file.

minsikl commented 7 years ago

file_storage and block_storage provide different attributes and require different information for host authentications.

In the case of file_storage resource, you can utilize allowed_subnets to automate authentication processes: Prepare a primary private subnet in advance -> Create a file_storage -> Authenticate the subnet on the file_storage -> Create a VM on the subnet -> Configure nfs with a post-provisioning script.

In the case of block_storage, it does not provide allowed_subnets attribute. Moreover, hostname and volumname aren't enough to configure the volume. You need username, password, and hostIQN which are populated after the authentication process. allowed_virtual_guest_info attribute of block_storate provides those information. So, storage_id isn't enough to configure iSCSI in a post-provisioning script.

It looks like that the requirement describes file_storage and VM case. If the requirement focuses on file_storage case only, I think that the above scenario can be a solution.

ashishth09 commented 7 years ago

@renier,

Great we on the same page :+1: on all points except on pure optional (+ not refreshing ) or optional + computed. I prefer the latter as at-least it tells the user about what is there in his cloud and decide for himself the actions.

In case of optional and no read that liberty is gone. He is no more made aware of the actual changes to his cloud beyond his actions even though they are not likely to occur and wrong (if they do occur) as you say, there is a chance they may occur.

Additionally if the chances of change from outside are so low, making it computed won't lead to unexpected diffs.

renier commented 7 years ago

@ashishth09 The idea for making the property write-only is not to get rid of diffs caused by outside configuration. It is to enable the TF workflow for being able to create the file_storage with no permissions, then create the virtual_guest (in the same TF run) which modifies the permissions on the file_storage. Without it causing a diff on the next TF apply.

ashishth09 commented 7 years ago

@renier , Why would computed not achieve the same effect?

minsikl commented 7 years ago

If allowed_virtual_guest_ids is provided without read function, import function will not retrieve allowed_virtual_guest information and users have to manually check the real list from different systems such as SL portal. Moreover, terraform will not support delete function. It is necessary to provide CRUD or R(computed).

renier commented 7 years ago

@minsikl Are you saying that a file_storage resource cannot be deleted if it is linked to virtual_guests or other hardware?

renier commented 7 years ago

If that is the case, the delete operation can always clear out the ACLs itself to enable deletion of the resource in the API.

minsikl commented 7 years ago

@renier No, I'm talking about ACL's CRUD.

renier commented 7 years ago

@ashishth09

Minsik is suggesting making the allowed_*_ids attributes read-only (i.e Computed: true, Optional: false). I was suggesting write-only (i.e. Computed: false, Optional: true). Either one will work. The important thing is that it cannot be read/write.

minsikl commented 7 years ago

@ashishth09 Users still can configure ACLs with different attributes of file_storage such as allowed_ip_address or allowed_subnets.

kaufers commented 7 years ago

Any estimate on when this will be available? We are trying to understand when we can start utilizing this support. No pressure, we're just looking for a rough estimate. Thanks!

ashishth09 commented 7 years ago

We will have this out in the next release @kaufers.

ashishth09 commented 7 years ago

@kaufers,

I am not sure if you noticed @minsikl suggestions above. I am putting a portion here,

In the case of file_storage resource, you can utilize allowed_subnets to automate authentication processes: Prepare a primary private subnet in advance -> Create a file_storage -> Authenticate the subnet on the file_storage -> Create a VM on the subnet -> Configure nfs with a post-provisioning script.

Does that help? Also it is just the file_storage resource you are interested in? In this case you will not need provisioners to authenticate.

kaufers commented 7 years ago

@ashishth09 Yep, saw this but I don't think that we are going to utilize it as it would complicate our VM provisioning flow (since we'd need to track the VM subnet placement).

ashishth09 commented 7 years ago

Two approaches we have, 1) allowed_hardware_ids and allowed_virtual_guest_ids are set to computed only and add storage_id to VM with optional + computed.

As discussed this is going to break backward compatibility where current users will no more be able define these attributes in the tf and add anything to it. They can of course use other attributes like allowed_ip_address or subnets

2) allowed_hardware_ids and allowed_virtual_guest_ids to optional + computed and in the VM/BM resource have storage_id optional + computed as well. Works for the existing users. But in this case we need to warn the user Use VM/BM resource to authorize to the storage or storage resource to authorize to the VM/BM. No Mix match is allowed (some clear version of this). I like this one better as it still allows all the previous options + one more.

ashishth09 commented 7 years ago

@renier, @minsikl , I re-visited the approaches and while I like 2 above, we might not know if some changes happen outside terraform in choosing that approach. In spite of the documentation about using one or the other this is not full-proof.

In approach 1, I don't see why should make them computed alone. We might as well remove them altogether. They are not really computed and we had put them there to be used for authorizing and not just reading the values.

So I was thinking to remove - allowed_hardware_ids and allowed_virtual_guest_ids from file/block storage and add storage_id in VSI and Bare Metal. User can still authorize via allowed_ip_address and subnet in file storage and allowed_ip_address in block storage. In addition they can use newly introduced storage_id on VSI. This of course will force user to change their existing tf files if they were using allowed_hardware_ids and allowed_virtual_guest_ids to authorize.

Let me know your thoughts as we want smooth transition for user coming from terraform-provider-softlayer to this repo. How difficult would it be for users to ditch allowed_hardware_ids and allowed_virtual_guest_ids. Will allowed_ip_address or subnet work in all cases?

minsikl commented 7 years ago

@ashishth09, I like the approach 2: allowed_hardware_ids and allowed_virtual_guest_ids to optional + computed and in the VM/BM resource have storage_id optional + computed as well. Because SL UI provides allowed_*_ids per file/block storage, it is necessary to provide them in terraform resource as well. At the same time, SoftLayer API provides ACL management functions for network storage in SoftLayer_Virtual_Guest and SoftLayer_Hardware_Server. storage_id can be implemented with them.

SoftLayer provides multiple storage services: Local storage, portable storage, internal disks, endurance storage, performance storage, and object storage. Except for object storage, other storage requires ACL/attach/detach management with different APIs. I think that we need to use specific attribute names for storage_id to distinguish different storage types. file_storage_id and block_storage_id would be better than storage_id.

ashishth09 commented 7 years ago

@minsikl ,

Even my initial thoughts was to go with 2. I agree this will be less disruptive than other solutions.

I think that we need to use specific attribute names for storage_id to distinguish different storage types. file_storage_id and block_storage_id would be better than storage_id.

I think putting them together simplifies things as the adding/removing of those are done using the same API

minsikl commented 7 years ago

@ashishth09 Although block_storage and file_storage use the same APIs internally, they are different storage and require different configuration in user side. If the same attribute name is used and file_storage_id and block_storge_id are mixed, users are not able to filter file_storge_id or block_storage_id. Conversely, if file_storage_id and block_storage_id are used, if users want to have a single id list, they can use built-in functions such as concat

IBM-Cloud / terraform

Dynamically authorize VSI to storage resources #97