RETURN-project / makeDataCube

Data management
Apache License 2.0
0 stars 0 forks source link

Run analysis in parallel #18

Closed wandadk closed 3 years ago

wandadk commented 4 years ago

Problem

Currently, the code is not fully run in parallel. It should be optimized in various ways:

In make_Landsat_cube.Rmd

  1. Several parts could be run in parallel (eg downloading Landsat data) and are currently sequential
  2. Processing data in FORCE can be run in parallel without problems (the settings file should be adjusted), but we should define which settings are most optimal to process the data. More info can be found here and in section 8 of this tutorial

In make_mask_fire_cube.Rmd

  1. A loop is used to iterate over each tile. In each iteration, data are prepared. It should be fairly easy to parallelize this code. One point of attention is reading and writing of data (e.g. temporary files are generated).
PabRod commented 3 years ago

Proposed solutions

In make_Landsat_cube.Rmd

  1. Use Snakemake for streamlining the workflow.
  2. Use FORCE's internal parallelization. Practically, this just translates to passing NPROC and NTHREAD to the parameter file. Subtask: figure out the advisable values.

In make_mask_fire_cube.Rmd

  1. Explore the possibility of parallelizing using R, as we did with BenchmarkRecovery.
PabRod commented 3 years ago

Merge into #36