BinomialLLC / basis_universal

Basis Universal GPU Texture Codec
Apache License 2.0
2.71k stars 267 forks source link

Non-deterministic results on different platforms #60

Open lexaknyazev opened 5 years ago

lexaknyazev commented 5 years ago

In the generate_hierarchical_codebook_threaded function, an std::unordered_map is used for counting unique training vectors. https://github.com/BinomialLLC/basis_universal/blob/600232075851c2fa6bd23f7180434aca7814c93a/basisu_enc.h#L1598-L1608

Iteration over the unordered map is implementation-dependent, so this code builds group_quant differently on MSVC and GCC/Clang. https://github.com/BinomialLLC/basis_universal/blob/600232075851c2fa6bd23f7180434aca7814c93a/basisu_enc.h#L1634-L1638

As a result, the encoder produces different selectors and codebooks on different platforms.

After switching group_hash to be an ordered map, files produced on different platforms are the same (likely with some performance cost):

diff --git a/basisu_enc.h b/basisu_enc.h
index 7d42121..7946c76 100644
--- a/basisu_enc.h
+++ b/basisu_enc.h
@@ -21,7 +21,7 @@
 #include <condition_variable>
 #include <functional>
 #include <thread>
-#include <unordered_map>
+#include <map>

 #ifndef _WIN32
 #include <libgen.h>
@@ -1602,8 +1602,7 @@ namespace basisu
        uint32_t max_threads, job_pool *pJob_pool)
    {
        typedef bit_hasher<typename Quantizer::training_vec_type> training_vec_bit_hasher;
-       typedef std::unordered_map < typename Quantizer::training_vec_type, weighted_block_group, 
-           training_vec_bit_hasher> group_hash;
+       typedef std::map < typename Quantizer::training_vec_type, weighted_block_group> group_hash;

        group_hash unique_vecs;

Maybe, there is a better way to achieve deterministic results.

turol commented 5 years ago

Based on a quick look you could try this:

Split the loop on line 1634 into two. The first one adds to unique_vec_iters, the second to read from it instead of unique_vecs and adds to group_quant. Then add std::sort of unique_vec_iters between them to make it deterministic. If all Quantizer don't care about order you could do with just one loop and sort after. I don't want to go looking through all the code to establish that.

MarkCallow commented 5 years ago

Is unordered map + sort definitely faster than an ordered map?

turol commented 5 years ago

Usually, in insert or lookup-heavy loads. In this case? I don't know. Try them and find out.

richgel999 commented 5 years ago

Okay, this is good work, thank you! I am finishing up the next milestone (ASTC, PVRTC alpha, etc.), and once it is done in 2-3 weeks I will work on this problem. I may just switch to my own fully deterministic containers.

lexaknyazev commented 5 years ago

Here, each key is a vector with 6 float components.

In the worst case, comparison https://github.com/BinomialLLC/basis_universal/blob/600232075851c2fa6bd23f7180434aca7814c93a/basisu_enc.h#L112 and equality https://github.com/BinomialLLC/basis_universal/blob/600232075851c2fa6bd23f7180434aca7814c93a/basisu_enc.h#L111 operators involve FP operations for all 6 components.

For the unordered map, a simple byte-based hasher is used: https://github.com/BinomialLLC/basis_universal/blob/600232075851c2fa6bd23f7180434aca7814c93a/basisu_enc.h#L71-L80

Maybe, one more option is to use these hash values as comparison values for the ordered map to avoid FP operations.

richgel999 commented 5 years ago

Also note that I have not touched the encoder at all for this milestone, just the transcoders, so any work you do will be easily merged in.

MarkCallow commented 3 years ago

I'm still getting non-deterministic results across systems with version 1.13 even though some custom containers have been added. All but one of my tests of ETC1S/BasisLZ encoding that pass on macOS fail on Windows and Linux. (I don't know if the Windows and Linux results are different from each other.) In one example the length of the compressed data is 1 byte less on Windows than macOS (47 vs. 48 bytes). The global dictionaries (selectors, endpoints, & tables) are the same size.

Curiously in the one case that succeeds on all platforms, the input file is a .pgm file. All the other files are .png. Also it only has one component.

lexaknyazev commented 3 years ago

even though some custom containers have been added

The root cause seems to be still in place: https://github.com/BinomialLLC/basis_universal/blob/041ad47c27e0491a6d7cbffc84537461566ad02f/encoder/basisu_enc.h#L1843-L1846

richgel999 commented 2 years ago

We now have custom contains for vector and hash tables/sets. However, I believe there's still a std hash table in there in one place. Still leaving this up for more investigation.

MarkCallow commented 2 years ago

However, I believe there's still a std hash table in there in one place. Still leaving this up for more investigation.

Yes there must be. I just ran the KTX-Software tests after integrating 1.16.3. ETC1S-encoded images on Windows and Linux are different from those on macOS. There were 2 fewer images different from macOS on Linux than on Windows so likely the Windows and Linux images are different from each other.