Hextremist / mapredo

Mapredo -- Mapreducing at the speed of C
GNU Lesser General Public License v2.1
0 stars 1 forks source link

Add a ram_reader class to utilize memory better #5

Open Hextremist opened 9 years ago

Hextremist commented 9 years ago

Currently, all output from the mappers is written to temporary files. This data is then merged, and the files are read by the _tmpfilereader class. While this works well, it may not be optimal.

Modern computers have a lot of RAM, but not a lot of cache. Experimentation shows that trying to use more memory than available L3 cache in the sort phase (random access) only slows things down. So we have a lot of memory available that is only used for disk caching.

The idea is to take some of this memory and create "files" in memory, to avoid having to write to disk. But instead of using a RAM disk, we just use a buffer in memory. The parent class of tmpfile_reader is called _datareader. We need to create a new class _ramreader that inherits this just like tmpfile_reader does. This new class gets a buffer (argument to constructor) that data is read from.