Open bloodcarter opened 7 years ago
What is the structure of Lemma
and Form
? If you can't post actual code, can you just describe what they are serializing (and also the sizeof)? Are you using polymorphism?
I'll try and see if I can reproduce this. Our binary serialization should be very fast.
I am also experiencing a similar problem. I have the following data to serialize:
std::unordered_map <uint64_t, std::bitset<60> > my_map;
my_map
contains about 8-billion elements, and the binary file saved is around 33 GB on disk. when I deserialize it using
std::ifstream istrm("map.cereal", std::ios::binary);
cereal::BinaryInputArchive iarchive(istrm);
iarchive(my_map);
I takes about 2500 secs. Isn't that a bit slow?
I would say that depends. loading that much data into memory is going to be time consuming either way. having to send that to swap is going to be quite time consuming.
Additionally, with that much data, the unordered_map is going to be re-indexing near-constantly. With that much data indexed by a uint64_t, you are likely better off choosing a different data structure (depending on your distribution of keys).
Thanks for your reply
... having to send that to swap is going to be quite time consuming.
No need to send to swap, for my particular problem, having enough memory is not an issue.
One way to avoid re-indexing/hashing to call reserve
(size_type count);
function on the unordered_map object.
If I do that, the loading time goes down to ~1000secs.
Well, ifstream seems to do an additional copy as a part of it as well, so you're copying the data at least 2x. Perhaps consider using something like boost::iostreams::mapped_file. That'll probably save you another few hundred seconds.
Additionally, are you compiling with optimizations on? The cereal code is pretty template heavy, so it benefits extremely well from higher optimization levels. Particularly setting things like -march=native (if that is acceptable).
We can definitely add a call to reserve
for unordered_map
loads.
serialization and deserialization of map<int, vector
Also similar problem arise when I try using map<int,string>
Can we add map.reserve(size);
here - https://github.com/USCiLab/cereal/blob/master/include/cereal/types/concepts/pair_associative_container.hpp#L56 ?
UPD. also:
set.reserve(size);
https://github.com/USCiLab/cereal/blob/master/include/cereal/types/set.hpp#L58
I'm archiving this data:
The size of lemmas vector and forms map is ~100,000 each. The problem is deserializing from portable binary takes 30 secs of my Core i5!
Is that normal or what?