BinomialLLC / basis_universal

Basis Universal GPU Texture Codec
Apache License 2.0
2.72k stars 267 forks source link

Remove contention point from find_optimal_selector_clusters_for_each_block #276

Closed zeux closed 2 years ago

zeux commented 2 years ago

Before this change we filled block->cluster mapping in parallel but also tried to aggregate list of blocks for each cluster in the same job.

In a multithreaded run on a Ryzen 5900 system (24 threads) this resulted in significant contention over the mutex; moving the distribution to run after all threads finished (serially) brought total compression time down from 6.5 to 5.5 seconds.

richgel999 commented 2 years ago

This sounds great - thanks for submitting this. Will merge after validating (it's a small-ish looking change so that should be simple).

zeux commented 2 years ago

Looks like this was fixed as part of https://github.com/BinomialLLC/basis_universal/commit/deeb5acb563246f9b747229636205c5b19b99839#diff-a01c185dbfc142f2a0b5c67b814a556de24466173732064baa30ca51b3a35b97 so I'll close this.

richgel999 commented 2 years ago

Cool, thanks