Open lexaknyazev opened 5 years ago
Based on a quick look you could try this:
Split the loop on line 1634 into two. The first one adds to unique_vec_iters
, the second to read from it instead of unique_vecs
and adds to group_quant
. Then add std::sort
of unique_vec_iters
between them to make it deterministic. If all Quantizer
don't care about order you could do with just one loop and sort after. I don't want to go looking through all the code to establish that.
Is unordered map + sort definitely faster than an ordered map?
Usually, in insert or lookup-heavy loads. In this case? I don't know. Try them and find out.
Okay, this is good work, thank you! I am finishing up the next milestone (ASTC, PVRTC alpha, etc.), and once it is done in 2-3 weeks I will work on this problem. I may just switch to my own fully deterministic containers.
Here, each key is a vector with 6 float components.
In the worst case, comparison https://github.com/BinomialLLC/basis_universal/blob/600232075851c2fa6bd23f7180434aca7814c93a/basisu_enc.h#L112 and equality https://github.com/BinomialLLC/basis_universal/blob/600232075851c2fa6bd23f7180434aca7814c93a/basisu_enc.h#L111 operators involve FP operations for all 6 components.
For the unordered map, a simple byte-based hasher is used: https://github.com/BinomialLLC/basis_universal/blob/600232075851c2fa6bd23f7180434aca7814c93a/basisu_enc.h#L71-L80
Maybe, one more option is to use these hash values as comparison values for the ordered map to avoid FP operations.
Also note that I have not touched the encoder at all for this milestone, just the transcoders, so any work you do will be easily merged in.
I'm still getting non-deterministic results across systems with version 1.13 even though some custom containers have been added. All but one of my tests of ETC1S/BasisLZ encoding that pass on macOS fail on Windows and Linux. (I don't know if the Windows and Linux results are different from each other.) In one example the length of the compressed data is 1 byte less on Windows than macOS (47 vs. 48 bytes). The global dictionaries (selectors, endpoints, & tables) are the same size.
Curiously in the one case that succeeds on all platforms, the input file is a .pgm file. All the other files are .png. Also it only has one component.
even though some custom containers have been added
The root cause seems to be still in place: https://github.com/BinomialLLC/basis_universal/blob/041ad47c27e0491a6d7cbffc84537461566ad02f/encoder/basisu_enc.h#L1843-L1846
We now have custom contains for vector and hash tables/sets. However, I believe there's still a std hash table in there in one place. Still leaving this up for more investigation.
However, I believe there's still a std hash table in there in one place. Still leaving this up for more investigation.
Yes there must be. I just ran the KTX-Software tests after integrating 1.16.3. ETC1S-encoded images on Windows and Linux are different from those on macOS. There were 2 fewer images different from macOS on Linux than on Windows so likely the Windows and Linux images are different from each other.
In the
generate_hierarchical_codebook_threaded
function, anstd::unordered_map
is used for counting unique training vectors. https://github.com/BinomialLLC/basis_universal/blob/600232075851c2fa6bd23f7180434aca7814c93a/basisu_enc.h#L1598-L1608Iteration over the unordered map is implementation-dependent, so this code builds
group_quant
differently on MSVC and GCC/Clang. https://github.com/BinomialLLC/basis_universal/blob/600232075851c2fa6bd23f7180434aca7814c93a/basisu_enc.h#L1634-L1638As a result, the encoder produces different selectors and codebooks on different platforms.
After switching
group_hash
to be an ordered map, files produced on different platforms are the same (likely with some performance cost):Maybe, there is a better way to achieve deterministic results.