Open JoshSalomon opened 1 week ago
sure, what do you have in mind?
Wondering where to start: Have you heard anything about the read balancer (available since Reef)?
yes, i saw the initial presentation slides, and wondered how it compared to my balancer, but I didn't priorize just remapping primaries so far, but i think this can be added, too. I didn't use it in a production cluster so far.
Thinking about balancers, it seems that the whole crush approach may not be ideal after all, and just having an efficient pg->osd mapping lookup table is probably suitable for nearly all clusters. then we wouldn't have to fight crush with one hack after the other to get a better desired mapping adjustment, instead of (re)mapping it directly.
Ceph improved read balancer.pdf
If I understand correctly - your balancer is a capacity balancer, not a read balancer - but I just heard your presentation in the past and did not dive into the code.
The read balancer is only a meta data operation, and it does not move data so it is a completely different approach, and is cheaper to execute continuously (more on this later)
The first version (in Reef) just makes sure that in each OSD you have the fair share of primaries (the read balancer works only on replicated pools so in each OSD we try to make pg_num/replica_num primaries). Obviously we check it against CRUSH constraints.
In Squid, we added a functionality that improves cluster performance when the devices are not of the same size. We added a pool parameter for the read_ratio of the IOs to the pool (70 means that 70% of the ios to the pool are read and 30% write) - with this information, we can optimally move more reads to the smaller devices and let the larger devices handle less reads so we try to balance the IOPS per OSD (assuming the devices have the same performance profile). In the future, we may calculate the read ratio automatically based on metrics and make it an adaptive system (I am not sure this is needed, but it will be easy to implement) for optimal performance.
Attached is the presentation explaining the model behind this balancer and some examples.
If you think this is worth mentioning, Laura and I can open a PR with the explanation to this Ceph guide
Hi JJ - great Page, I believe it is worth adding information about the read balancer, especially since the Squid version will support OSDs of different sizes. Would you like to work with @ljflores and me about it?