abhisheksinghbaghel / azuredisk-csi-driver

Azure Disk Container Storage Interface (CSI) Storage Plugin
Apache License 2.0
2 stars 0 forks source link

Bug: Filtering replica attachment for a pod when assessing node capacity. #367

Closed landreasyan closed 2 years ago

landreasyan commented 2 years ago

What type of PR is this? When node is at capacity for disk attachments, scheduler extender automatically rejects new pods that require disk attachments without verifying first if the replica disk attachments for the pod already exist on the node.

What this PR does / why we need it:

Which issue(s) this PR fixes:

366

Fixes #

Requirements:

Special notes for your reviewer:

Release note:

none
coveralls commented 2 years ago

Pull Request Test Coverage Report for Build 1792299786


Totals Coverage Status
Change from base Build 1792145912: 21.8%
Covered Lines: 5167
Relevant Lines: 7713

💛 - Coveralls
sunpa93 commented 2 years ago

There seem to be other changes that need to be made to the schedulerExtender (e.g. pvName != azVolumeName), but these changes were added in the async PR https://github.com/abhisheksinghbaghel/azuredisk-csi-driver/pull/346, so let's not worry about it here. I can also fix the typo that I pointed out as I rebase the changes in my PR :)

landreasyan commented 2 years ago

I'll fix the typo as I need to fix the merge conflict anyways. Thanks for pointing that out. Could you elaborate more on the extra changes needed? Can't we make sure the azvolumeattachment always has the correct label for volume instead of doing the additional mapping in the scheduler?

sunpa93 commented 2 years ago

I'll fix the typo as I need to fix the merge conflict anyways. Thanks for pointing that out. Could you elaborate more on the extra changes needed? Can't we make sure the azvolumeattachment always has the correct label for volume instead of doing the additional mapping in the scheduler?

Not too many changes but some refactoring had been made to replace the manually mutexed map to sync.Map to keep things consistent, add PV to disk name mapping, and allow scheduling on nodes where volume has not been fully attached yet.

To avoid mapping PV to disk name in AzDiskSchedulerExtender, we would essentially need to add label indicating PV name to AzVolumeAttachment, which I don't think would be a huge issue but given that the addition of PV label needs to be done at the attach_detach controller level instead of crdProvisioner level, if for whatever reason, attach_detach controller takes a long time to get to that update label procedure, azDiskSchedulerExtender will not be able to fetch that AzVolumeAttachment CRI from its list function. So, if an additional mapping at AzDiskSchedulerExtender does not introduce much additional overhead, I think we should have mapping in place.

landreasyan commented 2 years ago

I'll fix the typo as I need to fix the merge conflict anyways. Thanks for pointing that out. Could you elaborate more on the extra changes needed? Can't we make sure the azvolumeattachment always has the correct label for volume instead of doing the additional mapping in the scheduler?

Not too many changes but some refactoring had been made to replace the manually mutexed map to sync.Map to keep things consistent, add PV to disk name mapping, and allow scheduling on nodes where volume has not been fully attached yet.

To avoid mapping PV to disk name in AzDiskSchedulerExtender, we would essentially need to add label indicating PV name to AzVolumeAttachment, which I don't think would be a huge issue but given that the addition of PV label needs to be done at the attach_detach controller level instead of crdProvisioner level, if for whatever reason, attach_detach controller takes a long time to get to that update label procedure, azDiskSchedulerExtender will not be able to fetch that AzVolumeAttachment CRI from its list function. So, if an additional mapping at AzDiskSchedulerExtender does not introduce much additional overhead, I think we should have mapping in place.

Do we need to update the CRI to add the label? Can't we add it on creation?

sunpa93 commented 2 years ago

I'll fix the typo as I need to fix the merge conflict anyways. Thanks for pointing that out. Could you elaborate more on the extra changes needed? Can't we make sure the azvolumeattachment always has the correct label for volume instead of doing the additional mapping in the scheduler?

Not too many changes but some refactoring had been made to replace the manually mutexed map to sync.Map to keep things consistent, add PV to disk name mapping, and allow scheduling on nodes where volume has not been fully attached yet. To avoid mapping PV to disk name in AzDiskSchedulerExtender, we would essentially need to add label indicating PV name to AzVolumeAttachment, which I don't think would be a huge issue but given that the addition of PV label needs to be done at the attach_detach controller level instead of crdProvisioner level, if for whatever reason, attach_detach controller takes a long time to get to that update label procedure, azDiskSchedulerExtender will not be able to fetch that AzVolumeAttachment CRI from its list function. So, if an additional mapping at AzDiskSchedulerExtender does not introduce much additional overhead, I think we should have mapping in place.

Do we need to update the CRI to add the label? Can't we add it on creation?

We would need to expect that pv information is included in volume_context that is passed on as ControllerPublishVolume parameter and if I remember correctly, I don't think we can rely on that.