amittai / cynical

Cynical data selection
MIT License
20 stars 7 forks source link

too slow #4

Closed robotzheng closed 5 years ago

robotzheng commented 6 years ago

how to fast it?

amittai commented 6 years ago

can you provide details about your setup? how large are the corpora, how much memory do you have (and is it using? is your disk swapping?)?

in general, increasing $batchsize is the way to increase throughput. the default is: $batchsize = POSIX::ceil( sqrt($num_lines_for_bestword) );

there are other ways of reducing memory usage [namely, increase $mincount], in case your speed comment is actually about running out of memory and causing endless swapping.

amittai commented 5 years ago

[ closing due to inactivity ]