Spread placement currently "deals" one device to a core and then moves onto the next. This results in a low thread occupancy for each core.
Due to a "feature" in Tinsel that was discovered during CMW's network profiling, we now know that reducing the number of active threads per core does not improve performance. The mailbox is designed to permit the full throughput with all threads active. Reducing the number of threads does not improve message latency per core.
Over-spreading the problem can result in a detriment to performance as things are less local, network paths are longer and there is not speedup to compensate.
To improve matters, Spread placement should be modified to "deal" eight devices to a core before moving to the next core.
Spread placement currently "deals" one device to a core and then moves onto the next. This results in a low thread occupancy for each core.
Due to a "feature" in Tinsel that was discovered during CMW's network profiling, we now know that reducing the number of active threads per core does not improve performance. The mailbox is designed to permit the full throughput with all threads active. Reducing the number of threads does not improve message latency per core.
Over-spreading the problem can result in a detriment to performance as things are less local, network paths are longer and there is not speedup to compensate.
To improve matters, Spread placement should be modified to "deal" eight devices to a core before moving to the next core.