Open seonghobae opened 4 years ago
Thanks. Looks fine
Thanks, I'm testing this commits with my real research project; it seems faster if I added multiple clusters with ssh connections with this cluster activation procedure below.
future::plan(list(
future::tweak(
'cluster',
workers = paste0('mpiuser@192.168.1.', 179:180),
homogeneous = F
),
future::tweak('multiprocess', workers = max(c(
1, round(parallel::detectCores(logical = F) * .5)
)))
))
Can you also compare speed wrt pull request https://github.com/bnosac/textrank/pull/7
ing> Can you also compare speed wrt pull request #7
Request #7 has some appropriate speed improvements theoretically within the application of the data.table library using primary keys and have beautiful interfaces. However, I can not find out where I can set the number of parallel cores in request #7. Request #7 uses pbapply to display progress information; however, in my knowledge, pbapply
API doesn't support any multi-machine environment (only able to single machine parallelism). That means future.apply
can support supercomputing works with the 'future' API, but pbapply
can't.
I need the 'multi-machine parallelism' environment to real speed improvements to extensive scientific language research with heterogeneous computing. I have ten machines, including my VPS and Workstations; they made significant speed improvements eight times with #9 even I'm using 1Gbps lines. Without any multicore or multimachine based function; the parallelized apply
functions like future.apply
library; I can not believe any ideas can improve calculation speed without multi-machine and multicore based parallelized apply functions. The data.table
speed up the data processing as a temporary in-memory database, not calculation speed improvements.
Request #8 and #9 include nested parallel structures with replacing all of existed *apply functions, not only textrank_sentence
but also all of the textrank::
related functions. Even pbapply supports the cl objects from parallel::makeCluster()
, however, that hard to support any nested parallelism. Therefore, they can reach speedup of calculations depends on the number of threads and machines. The main issue doesn't exist among data.table library, the core is parallelized apply
functions to solve the issue of #7 with among machines.
Sorry for my misunderstanding, I fix codes properly work what I get reviews in https://github.com/bnosac/textrank/pull/8 here.
installed.packages()
.requireNamespace('future.apply')