basmith / sparsehash

Automatically exported from code.google.com/p/sparsehash
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Semantics of expected_max_items_in_table changed #35

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Between versions 1.1 and 1.4 the semantics of the constructor argument
expected_max_items_in_table changed due to this patch:

-    : hash(hf), equals(eql), get_key(ext), num_deleted(0),
-      use_deleted(false), delval(), table(min_size(0, n)) {    // start small
+    : hash(hf), equals(eql), get_key(ext), num_deleted(0), use_deleted(false),
+      delval(), enlarge_resize_percent(HT_OCCUPANCY_FLT),
+      shrink_resize_percent(HT_EMPTY_FLT),
+      table(expected_max_items_in_table == 0
+            ? HT_DEFAULT_STARTING_BUCKETS
+            : min_size(expected_max_items_in_table, 0)) {

The argument used to be passed as the second argument to min_size
(min_buckets_wanted) in version 1.1, and is now passed as the first agument
to min_size (num_elts).

This change has a large effect on the memory requirements of
sparse_hashtable. With version 1.1,
sparse_hash_set(1<<30)
would require 256 MB of RAM. Whereas with version 1.4, it requires double
that, at 512 MB of RAM.

Are the semantics of version 1.4 the correct, intended semantics, or a
regression? How do versions 1.2 and 1.3 behave?

The comment to min_size says:
  // This is the smallest size a hashtable can be without being too crowded
  // If you like, you can give a min #buckets as well as a min #elts
It does not define the term `elt'. What does it mean?

Cheers,
Shaun

Original issue reported on code.google.com by sjackman on 4 Mar 2009 at 12:21

GoogleCodeExporter commented 9 years ago
Yes, this is intentional.  See
http://groups.google.com/group/google-sparsehash/browse_thread/thread/3d6e83a8ac
9b828b#.
 I probably should announce it better.

Since the table size is always a multiple of two, I'm not surprised you're 
seeing a
doubling in size.  In the past, the hashtable was really too small to hold 1<<30
items.  Now it should be sized more appropriately for that.  If you want a 
smaller
hashtable, best to give it an arg of less than 1<<30, indicating you'll be 
inserting
less than a billion items.  Or you could raise HT_OCCPANCY_FLT.

} The comment to min_size says:
}   // This is the smallest size a hashtable can be without being too crowded
}   // If you like, you can give a min #buckets as well as a min #elts
} It does not define the term `elt'. What does it mean?

elt is an abbreviation for element.

Original comment by csilv...@gmail.com on 4 Mar 2009 at 8:56

GoogleCodeExporter commented 9 years ago
Thanks for the explanation. I'll fix my code to account for HT_OCCPANCY_FLT. In 
my
example above, 2^30 is a very gross estimate of the number of expected items, 
but the
number of elements being rounded up to 2^31 blows my memory budget. I've simply
picked a new `estimate' that is less than 2^30*HT_OCCPANCY_FLT.

Cheers,
Shaun

Original comment by sjackman on 4 Mar 2009 at 9:47