Shouldn't we be cloning the model here before calling partial fit? Otherwise we're mutating the input.
What if we have to rerun this task because the worker that the result was on failed?
I'm not sure what happens when the worker fails :)
Let's say that
Worker A completed partial_fit on the first block of data
Worker B fails on the second block of data.
IIRC, when a worker fails during computation, the scheduler will mark the task as suspicions and reschedule the task on another worker. Let's say it's scheduled on worker C for whatever reason.
Worker C asks worker A for fit-<token>-0. I think everything is OK. The scheduler should always have a correct understanding of who has the latest successful fit call.
Does that sound right? Am I missing scenarios where we do something wrong?
Questions from @mrocklin in https://github.com/dask/dask-ml/pull/275#issuecomment-402269422
I'm not sure what happens when the worker fails :)
Let's say that
partial_fit
on the first block of dataIIRC, when a worker fails during computation, the scheduler will mark the task as suspicions and reschedule the task on another worker. Let's say it's scheduled on worker C for whatever reason.
Worker C asks worker A for
fit-<token>-0
. I think everything is OK. The scheduler should always have a correct understanding of who has the latest successful fit call.Does that sound right? Am I missing scenarios where we do something wrong?