test failure in Debian bullseye: session/test-01.xml... .(bcacfghi vs. acbcfghi) FAILED

pabs3 commented 4 years ago

Forwarding Debian bug #935614:

The first session test fails with an error on Debian bullseye/sid (full build log:

session/test-01.xml... .(bcacfghi vs. acbcfghi) FAILED

The same version builds fine on Debian buster.

We could use some help in figuring out why the test fails.

NK-Analyst commented 4 years ago

Hi, @pabs3

Please, try to debug this file https://github.com/gobby/libinfinity/blob/master/test/inf-test-text-session.c (function perform_single_test in line 136). I think something is wrong in code result = inf_text_chunk_equal(test_chunk, final); P.S. Code for function inf_text_chunk_equal exists in file https://github.com/gobby/libinfinity/blob/master/libinftext/inf-text-chunk.c

pkern commented 4 years ago

I mean sure. The thing is: The algorithm changed its result and it's literally stopping corruption from getting shipped. Someone needs to take a look as to what changed so that the ordering of the first two chunks is consistently wrong. The two requests are effectively reordered when they are not expected to me:

 <request time="" user="2">
  <insert pos="2">ac</insert>
 </request>

 <request time="" user="3">
  <insert pos="3">bc</insert>
 </request>

If it wouldn't fail that consistently I'd have said that something started randomizing but that's not what we see in Debian. @aburgm Do you happen to have an idea?

zhsj commented 4 years ago

Not sure why previously the tests are passed, the requests are randomized because of the following code

https://github.com/gobby/libinfinity/blob/e09c5b7a955fdfab74eaee28edab575e14860962/test/inf-test-text-session.c#L208-L219

I think the test is expected to fail, since order of the two requests is indeterminate.

https://github.com/gobby/libinfinity/blob/e09c5b7a955fdfab74eaee28edab575e14860962/test/session/test-01.xml#L13-L19

aburgm commented 4 years ago

The order of requests should be irrelevant, which is what the test is testing... it should be decided by user ID to disambiguate otherwise ambiguous cases.

⁣Sent from BlueMail

On Dec 15, 2019, 17:11, at 17:11, zhsj notifications@github.com wrote:

Not sure why previously the tests are passed, the requests are randomized because of the following code

https://github.com/gobby/libinfinity/blob/e09c5b7a955fdfab74eaee28edab575e14860962/test/inf-test-text-session.c#L208-L219

I think the test is expected to fail, since order of the two requests is indeterminate.

https://github.com/gobby/libinfinity/blob/e09c5b7a955fdfab74eaee28edab575e14860962/test/session/test-01.xml#L13-L19

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/gobby/libinfinity/issues/22#issuecomment-565823142

pkern commented 4 years ago

So even after staring at the code for a long time and after a lot of printf debugging... In the failing permutation bc is already committed to the chunk as a segment at offset 0. It feels like the chunk code is entirely in its right to place ac after bc given that it does not have more information about the original offset. Which suggests that adopted is failing at providing sufficient state information and transforming the incoming requests in such a way that ac can actually go first? The successful permutations insert that string first, unsurprisingly. And of course the kicker is that it used to work, which usually points to changes in glib.

The thing that irritates me the most here is where the code would preserve the state. It does not seem to buffer the request - but then all arrive at the same "time" anyway, so why would it have to. The state vector that it preserves seems to revolve primarily about the time as expressed in count of operations.

Shifting the operations one char into the string has the same failure mode as well, bc's offset is squashed to the first character remaining after the delete.

Specifically the following premutations trigger the problem: (bc|delete) (the remaining one of the two) ac

pkern commented 4 years ago

As expected this is due to a change in glib - and I thought I was looking at that commit before, but now I finally bisected it:

0dee62973cd0e269c0d7dd194d6bdc0cd4a732f9 is the first bad commit
commit 0dee62973cd0e269c0d7dd194d6bdc0cd4a732f9
Author: Hans Petter Jansson <hpj@cl.no>
Date:   Tue Jul 10 12:21:59 2018 +0200

    ghash: Fix poor performance with densely populated keyspaces

    Sequential integers would be densely packed in the table, leaving the
    high-index buckets unused and causing abnormally long probes for many
    operations. This was especially noticeable with failed lookups and
    when "aging" the table by repeatedly inserting and removing integers
    from a narrow range using g_direct_hash() as the hashing function.

    The solution is to multiply the hash by a small prime before applying
    the modulo. The compiler optimizes this to a few left shifts and adds, so
    the constant overhead is small, and the entries will be spread out,
    yielding a lower average probe count.

:040000 040000 97ed217496c64783af2455ef39ccd634dfd4bba7 d599c7dcd1c4112d5dba5edb50deb4091eaf6fd9 M  glib

The full commit:

commit 0dee62973cd0e269c0d7dd194d6bdc0cd4a732f9 (HEAD, refs/bisect/bad)
Author: Hans Petter Jansson <hpj@cl.no>
Date:   Tue Jul 10 12:21:59 2018 +0200

    ghash: Fix poor performance with densely populated keyspaces

    Sequential integers would be densely packed in the table, leaving the
    high-index buckets unused and causing abnormally long probes for many
    operations. This was especially noticeable with failed lookups and
    when "aging" the table by repeatedly inserting and removing integers
    from a narrow range using g_direct_hash() as the hashing function.

    The solution is to multiply the hash by a small prime before applying
    the modulo. The compiler optimizes this to a few left shifts and adds, so
    the constant overhead is small, and the entries will be spread out,
    yielding a lower average probe count.

diff --git a/glib/ghash.c b/glib/ghash.c
index 18c7185ef..888e67e00 100644
--- a/glib/ghash.c
+++ b/glib/ghash.c
@@ -334,6 +334,16 @@ g_hash_table_set_shift_from_size (GHashTable *hash_table, gint size)
   g_hash_table_set_shift (hash_table, shift);
 }

+static inline guint
+g_hash_table_hash_to_index (GHashTable *hash_table, guint hash)
+{
+  /* Multiply the hash by a small prime before applying the modulo. This
+   * prevents the table from becoming densely packed, even with a poor hash
+   * function. A densely packed table would have poor performance on
+   * workloads with many failed lookups or a high degree of churn. */
+  return (hash * 11) % hash_table->mod;
+}
+
 /*
  * g_hash_table_lookup_node:
  * @hash_table: our #GHashTable
@@ -382,7 +392,7 @@ g_hash_table_lookup_node (GHashTable    *hash_table,

   *hash_return = hash_value;

-  node_index = hash_value % hash_table->mod;
+  node_index = g_hash_table_hash_to_index (hash_table, hash_value);
   node_hash = hash_table->hashes[node_index];

   while (!HASH_IS_UNUSED (node_hash))
@@ -602,7 +612,7 @@ g_hash_table_resize (GHashTable *hash_table)
       if (!HASH_IS_REAL (node_hash))
         continue;

-      hash_val = node_hash % hash_table->mod;
+      hash_val = g_hash_table_hash_to_index (hash_table, node_hash);

       while (!HASH_IS_UNUSED (new_hashes[hash_val]))
         {

I also validated that 2.62.5 and 2.63.5 are still broken.

pkern commented 4 years ago

That being said I am still unsure where the disambiguation of the user ID happens in a way that the hash table behavior matters.

jscott0 commented 4 years ago

I checked that GLib 2.64.3 is also still broken.

SunilMohanAdapa commented 4 years ago

Apparently, all the tests list users in the sorted order 1 to 5 and expect the algorithm to give priority accordingly when expecting the output. When the values are inserted into the user table, with earlier glib, the iteration order was always 1 to 5 (by chance) even though they were inserted 5 to 1. GHashTable makes no such promises. In the newer glib due to a minor tweak to the hash algorithm, this order is disrupted and the tests started to fail.

Further, GHashTable also does not make any guarantees about consistent ordering when iterating multiple times.

A possible solution seems to be to iterate the user table hash table in sorted order. I proposed fixes with two different approaches in #24 and #25.

Please consider merging a fix soon and making a release. I would love to see infinoted return to Debian testing (and FreedomBox testing).

pabs3 commented 4 years ago

infinoted has reached testing again as @pkern has imported the patches.

This issue still needs an upstream release though.

pabs3 commented 4 years ago

@pkern: you appear to have commit access (or had in 2017), would you mind merging the fixes?

pabs3 commented 4 years ago

@aburgm @pkern: could we get a new release so that non-Debian distros can get the fixes?

Also #24 and #25 should get closed since the commits are now in the master branch.

gobby / libinfinity

test failure in Debian bullseye: session/test-01.xml... .(bcacfghi vs. acbcfghi) FAILED #22