NetApp / trident

Storage orchestrator for containers
Apache License 2.0
759 stars 222 forks source link

The ontap drivers should have more intelligent volume placement #64

Open arndt-netapp opened 7 years ago

arndt-netapp commented 7 years ago

The placement of flexvols by the ontap-nas-economy driver needs to be more intelligent. The following suggestions are made as a starting point, based on customer feedback for a large scale environment:

  1. Allow Trident to be configured so that we only continue to provision qtrees in a flexvol until the underlying aggregate is X percentage full or Y percentage oversubscribed.

  2. If multiple aggregates are defined, prefer to use whatever aggregate has the most free space and least oversubscribed.

  3. While #1 and #2 are more important in the short term, at some point it would also be desirable to have provisioning take into account node headroom.

clintonk commented 7 years ago

To be clear, most of these would apply to all ONTAP drivers. The Trident scheduler is definitely a rich target environment.

acsulli commented 6 years ago

I will +1 this request on behalf of many interactions I've had and add some additional items. I am often asked for several capabilities around the storage pool selection logic:

  1. Being able to exclude storage pools using the storage class definition. E.g. "I want all flash (media=ssd), except for AFF aggr1 and SF QoS policy Bronze." Currently the only mechanism to accomplish this is to do the inverse, where every storage pool except the one(s) to be excluded are specified in the storage class.

  2. A mechanism to stop provisioning new PVC requests against a storage pool when the underlying storage device (e.g. an ONTAP aggregate) reaches an arbitrary level of "full". This can/should be based on both actual/real capacity remaining as well as an over subscription ratio/percentage.

  3. Allow the specification of an arbitrary capacity limit for a particular backend. For example, with ONTAP, regardless of the actual size of an aggregate, Trident is only allowed to consume X GiB of that capacity. The same principle would apply to SolidFire, though the paradigm could be extended not just to GiB, but also IOPS (QoS minimums, specifically).

  4. Leverage AppDM as a backend.

  5. Leverage Service Level Manager as a backend.

  6. Incorporate ONTAP performance capacity as a storage pool selection metric.

hendrikland commented 6 years ago

Well, if we talk about placement logic, I'll add a few more items to consider (based on actual customer requirements):

Combined with the points already mentioned by others and weighted and sorted according to the specific requirements the customer has.

In the end, we'd either need a customizable rule engine in trident, or a flexible mechanism to attach other tools. If we decide to go with the later, I'd vote for WFA in addition to AppDM and NSLM that are already mentioned, since (today) it is the only tool that is flexible enough to build a custom solution that takes all of the above items into consideration.

CalvinHartwell commented 6 years ago

+1 for this as i'm hitting issues when using Trident + cloud manager together, as its still not picking the correct pool which is automatically generated for me.

Errant-Dutchman commented 5 years ago

+1 Same request: we have a 12 node cluster for the point: Node which holds the LIF for the SVM (in order to avoid indirect data access for best performance) and leveraging full volume provisioning as mentioned before.

Well, if we talk about placement logic, I'll add a few more items to consider (based on actual customer requirements):

  • Number of volumes on a node (since Ontap has logical limits that you don't want to exceed)
  • Node which holds the LIF for the SVM (in order to avoid indirect data access for best performance)
  • Provisioned IOPS (via adapative QoS concept) on the node/aggregate

Combined with the points already mentioned by others and weighted and sorted according to the specific requirements the customer has.

In the end, we'd either need a customizable rule engine in trident, or a flexible mechanism to attach other tools. If we decide to go with the later, I'd vote for WFA in addition to AppDM and NSLM that are already mentioned, since (today) it is the only tool that is flexible enough to build a custom solution that takes all of the above items into consideration.

YvosOnTheHub commented 4 years ago

also +1

Another idea. The limitAggregateUsage parameter looks at the global usage of the aggregate, not only the capacity managed by Trident.

if the aggregate is shared among different workloads, which is often the case, this parameter should be a bit more precise.

an example: