ParAlg / CPAM

CPAM: Compressed Parallel Augmented Maps
MIT License
21 stars 8 forks source link

Issue with inserts #2

Open wheatman opened 2 years ago

wheatman commented 2 years ago

I was getting some funny results when trying to perform non batched inserts and think there is either a bug or I don't understand something.

I was able to minimize it to the following code that I added the the microbenchmarks.

double test_dest_multi_insert_cache(size_t n, size_t m) {

  tmap m1;
  timer t;
  t.start();

  auto v = uniform_input_unsorted(n, 1UL<<40);

  for (size_t i = 0; i < n; i++) {
    par el = v[i];
    // par el = {i,i};
    m1.insert(el);
    std::cout <<std::get<0>(el) <<", " <<std::get<1>(el)  << " number elements = " << m1.size() << "\n";
  }

  double tm = t.stop();
  std::cout << "number elements = " << m1.size() << "\n";

  // check(check_multi_insert(v, u, m1), "multi_insert wrong");

  return tm;
}

I would expect this code to insert n random elements, or the elements from 1 to n depending on which line is commented out and print out the number of elements after each insertion, and while this does happen for the elements from 1 to n, this does not happen for the random elements.

The random elements seems to insert the first 128 elements as expected, then seems to rarely, but randomly insert more elements, inserting the 129th after 500ish total insertions, and the 130th after over 700, it did a total of 133 after 1000 insertions, while I didn't check each one, it prints out the random numbers and many that I checked were not repeats. It is also non deterministic and different runs have different numbers of elements inserted.

Any ideas what is going on, or what I am doing wrong.

I tried with 4 versions of the code (serial and parallel, compressed and uncompressed) my compile command was

g++-11 -g -O3 -DNDEBUG -DNO_AUG -DBLOCK_SIZE=128 -mcx16 -march=native -DHOMEGROWN -pthread -std=c++17 -Wall -I../../parlaylib/include -I../../include -o testParallel-CPAM-NA testParallel.cpp -L/usr/local/lib -ljemalloc
g++-11 -g -O3 -DNDEBUG -DPARLAY_SEQUENTIAL -DNO_AUG -DBLOCK_SIZE=128 -mcx16 -march=native -DHOMEGROWN -pthread -std=c++17 -Wall -I../../parlaylib/include -I../../include -o testParallel-CPAM-NA-Seq testParallel.cpp -L/usr/local/lib -ljemalloc
g++-11 -g -DUSE_DIFF_ENCODING -DBLOCK_SIZE=128 -O3 -DNDEBUG -DNO_AUG -mcx16 -march=native -DHOMEGROWN -pthread -std=c++17 -Wall -I../../parlaylib/include -I../../include -o testParallel-CPAM-NA-Diff testParallel.cpp -L/usr/local/lib -ljemalloc
g++-11 -g -DUSE_DIFF_ENCODING -DPARLAY_SEQUENTIAL -DBLOCK_SIZE=128 -O3 -DNDEBUG -DNO_AUG -mcx16 -march=native -DHOMEGROWN -pthread -std=c++17 -Wall -I../../parlaylib/include -I../../include -o testParallel-CPAM-NA-Diff-Seq testParallel.cpp -L/usr/local/lib -ljemalloc
Crazylqx commented 1 year ago

I found the same bug. commit id aa9f9dc1eebd681c171a3ab1d2f5547670492ee1 https://github.com/ParAlg/CPAM/blob/main/include/cpam/map_ops.h#L589

if (Entry::comp(Entry::get_key(et), key)) {
          parlay::assign_uninitialized(merged[out_off++], et);
        } else if (Entry::comp(key, Entry::get_key(et))) {
          parlay::assign_uninitialized(merged[out_off++], e);
          // forget to insert et here!
          placed = true;
        } else {  // get_key(et) == key
          parlay::assign_uninitialized(merged[out_off], et);
          combine_values(merged[out_off], e, true, f);
          out_off++;
          placed = true;
        }

This implementation will drop an element when inserting a new key to a compressed node.