Open GoogleCodeExporter opened 8 years ago
Umm... You have to dump the tries and then build a new one.
However, the memory usage of building a new marisa trie will be unacceptable if
all the keys are unique.
It might require more than 50GiB of memory.
Original comment by susumu.y...@gmail.com
on 20 Jan 2013 at 4:03
I'm not sure, but cloud computing, such as Amazon EC2, might be a solution if
the cost is acceptable.
Regards,
Original comment by susumu.y...@gmail.com
on 20 Jan 2013 at 4:08
Hmm .. I am a student and Amazon EC2 is not really what I was looking for but
thanks. Also .. will marisa-build open the dump file in memory mapped I/O ? ..
then probably it will not be a problem. My keys are not unique ...
Thanks
Original comment by neshmai...@gmail.com
on 20 Jan 2013 at 2:52
Unfortunately, marisa-build does not use memory mapped I/O.
Instead, if there are many repeated keys in dump, you can use a combination of
'sort' and 'uniq' to remove the duplication.
Original comment by susumu.y...@gmail.com
on 20 Jan 2013 at 4:01
Original issue reported on code.google.com by
neshmai...@gmail.com
on 19 Jan 2013 at 5:10