I'm concerned that we won't be able to process extreme amounts of data unless we take a two phase approach. I split the Builder class into NGramBuilder and MapBuilder. The NGramBuilder writes ngrams to disk. Then, I used Linux commands to consolidate the output. The MapBuilder creates maps and writes them to disk.
I'm concerned that we won't be able to process extreme amounts of data unless we take a two phase approach. I split the Builder class into NGramBuilder and MapBuilder. The NGramBuilder writes ngrams to disk. Then, I used Linux commands to consolidate the output. The MapBuilder creates maps and writes them to disk.