crowell / modpagespeed_tmp

Automatically exported from code.google.com/p/modpagespeed
Apache License 2.0
0 stars 0 forks source link

JS Canonical Library data is too large #713

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
We should probably use a sparse_hash_map rather than a std::map for space 
efficiency for this structure, and get a speed improvement for free.

It also looks like a bunch of string-sharing could happen for the map values 
but I'm not sure if that'd be worth it.

Original issue reported on code.google.com by jmara...@google.com on 30 May 2013 at 6:58

GoogleCodeExporter commented 9 years ago
More fundamentally, we should avoid copying this data between configurations if 
we can by making it copy-on-write; std::map deep-copies its map.

If we made it a map<SizeInBytes, map<Md5Signature, GoogleString*>> that ought 
to suffice, but who owns the GoogleString* is an open question.  Perhaps a 
SharedString rather than a GoogleString* is the right thing (Josh: would that 
work?).  That'd share the largest sub-structures.  Storing the MD5 hash as a 
fixed-size char array is also a possibility, though getting the hashing and 
comparison to come out right might involve some slightly-annoying coding.

For that matter, we could use a sorted array and binary search to save space 
here if we liked.  That might even let us share pointers to all the data about 
a given library.

Original comment by jmaes...@google.com on 30 May 2013 at 7:26

GoogleCodeExporter commented 9 years ago
The issue of sharing on Merge is coverged in Issue 712.

This issue is just making the basic thing smaller.

Original comment by jmara...@google.com on 30 May 2013 at 7:45