Open mitchellmanware opened 1 month ago
We should also further explore the interaction between future
and crew
within the containerized environment
@mitchellmanware future
and crew
(mirai
backend, actually) do not seem to work well as mirai
daemons do not allow nested parallelism. I think we could divide each target as small as possible such that each target is built fairly quickly. A potential impact of this approach to performance might be a large list and its subsequent impact on merging into a large data.frame
(or data.table
) object.
@sigmafelix We have started to implement something similar, breaking the temporal period down into 10/25/50 day chunks (optimal is still TBD) so that each worker runs quickly. This dynamic branching over smaller temporal chunks has shown benefits - an example is calculating NARR covariate for full temporal range and full AQS locations in ~5 minutes.
By using crew
local controller it is still unclear if we can indicate a specific amount of memory to a single target. If we can, this should clear issues associated with merging large list into data.fram
.
Now that the pipeline is successfully running through the container, we should do a more detailed review of how
crew_controller_local
controls the cpu and memory distribution to each working target.Before containerization, the
crew_controller_slurm
improved performance drastically due to cpu/memory declaration per controller and therefore balanced workers well.Previous controller settings:
Goal is to replicate these types of settings and performance gains via
crew_controller_local