Run analysis in parallel - Githubissues

RETURN-project / makeDataCube

Data management

Apache License 2.0

0 stars 0 forks source link

Run analysis in parallel #18

Closed wandadk closed 3 years ago

wandadk commented 4 years ago

Problem

Currently, the code is not fully run in parallel. It should be optimized in various ways:

In `make_Landsat_cube.Rmd`

Several parts could be run in parallel (eg downloading Landsat data) and are currently sequential
Processing data in FORCE can be run in parallel without problems (the settings file should be adjusted), but we should define which settings are most optimal to process the data. More info can be found here and in section 8 of this tutorial

In `make_mask_fire_cube.Rmd`

A loop is used to iterate over each tile. In each iteration, data are prepared. It should be fairly easy to parallelize this code. One point of attention is reading and writing of data (e.g. temporary files are generated).

PabRod commented 3 years ago

Proposed solutions

In `make_Landsat_cube.Rmd`

Use Snakemake for streamlining the workflow.
Use FORCE's internal parallelization. Practically, this just translates to passing NPROC and NTHREAD to the parameter file. Subtask: figure out the advisable values.

In `make_mask_fire_cube.Rmd`

Explore the possibility of parallelizing using R, as we did with BenchmarkRecovery.

PabRod commented 3 years ago

Merge into #36