Closed robotzheng closed 5 years ago
can you provide details about your setup? how large are the corpora, how much memory do you have (and is it using? is your disk swapping?)?
in general, increasing $batchsize is the way to increase throughput. the default is: $batchsize = POSIX::ceil( sqrt($num_lines_for_bestword) );
there are other ways of reducing memory usage [namely, increase $mincount], in case your speed comment is actually about running out of memory and causing endless swapping.
[ closing due to inactivity ]
how to fast it?