Open jameshcorbett opened 3 months ago
Brian has indicated to me that OSTs of unequal sizes may be devastating to performance. So perhaps the problem is really in Flux's scheduling policy. I've opened https://github.com/flux-framework/flux-coral2/issues/175.
Servers resources for lustre, when filled in by Flux, can look something like this:
What this gives us is 3 OSTs on
elcap1
and 1 onelcap2
. However, as @behlendorf noted,However I think there is a disconnect between the way Flux allocates storage and the the way
Servers
asks for the storage to be represented. At the moment Flux does not have any kind of policy to allocate equal amounts of storage from each rabbit. Flux may allocate a huge chunk of storage (let's say N bytes) fromelcap1
and a much smaller amount of storage (M bytes) onelcap2
(as in the example above), with the desire of nevertheless having a single OST (and perhaps MDT) on each despite the size differences. But there is no good way for us to represent that inServers
without doing something like the above, in which we take the greatest common divisor of N and M, make that theallocationSize
, and then multiply theallocationCount
for each by N / GCD(N, M) and M / GCD(N,M) respectively.@behlendorf also noted that imbalanced allocations may not be desirable, since
So Flux may need to work on a policy to equalize the amount of storage on each rabbit node. However it might be nice if we could do something like the following: