Open arunsrinivasan opened 5 years ago
Thxs and sorry for the slow reply.
Yes, I can see how this can happen for PSOCK clusters, e.g. plan(cluster, ...)
and plan(multisession)
. The problem is, as you suggest, that the search()
path is different in the worker compared to the main R session (e.g. with plan(sequential)
). What complicates it, is that something the search()
path gets updated automatically by some other parallel/future call, i.e. it's not always obvious how, why, and when.
To really fix this for this parallel backend, one would need to come up with a way for each future to re-arrange the search()
path on the worker(s) to be the same as that of the main R process. That's an issue to be solved in the future package.
What I think you're also reporting here is an inconsistency between future.apply and rolling your own version via plain futures. If so, I'm not sure why there is a difference, but it could be that packages to be attached are also automatically identified and has higher precedence than those you specify manually. Maybe this inconsistency can/should be fixed.
PS. You wanna use library()
- not require()
. And I don't think your example need to attach none of parallel, doSNOW, and foreach.
This was a bit hard to track down... Here goes.
data.table
andbit
both havesetattr
functions. But the annoying thing aboutbit:setattr
is that it returns NULL.data.table::setattr
returns the input object with the attribute modified invisibly. So, if you were to write a code like:(I'm not saying this is how one should go about it, but there are other cases where we need to set an attribute and assign the result to an object.)
In this case, depending on what
setattr
we've, it'll return the right expected result orNULL
.With this, consider this code:
I've created a
data.table
in the local environment (just 1 for simplicity) and am calling a function that sets the attribute in parallel. Of course this function is terribly simplified as well.Now, the way this works (as expected), is to first load
bit64
first anddata.table
next and then look for functions infoo
and possibly load more packages and then runfoo()
(AFAICT).And this works fine as you can see from the output. If you were to check
ans
, you'd get the data.table with their attributes set.Restart session (IMPORTANT). Now, with everything else remaining intact, if instead of running
values(...)
, I usefuture_lapply
:Note how
setattr
refers tobit
package.. I think this is because the packages get loaded after assessingsetattr
is used infoo
anddata.table
from local env has a function calledsetattr
and therefore gets loaded first followed by all packages infuture.packages
?