datalad / datalad-remake

Other
2 stars 4 forks source link

Compute instruction prioritization #51

Closed christian-monch closed 1 week ago

christian-monch commented 2 weeks ago

DataLad REMAKE allows the association of multiple compute instructions with individual files. Currently, there is no mechanism to prioritize the compute instructions.

Prioritization requires:

  1. Labelling/Addressing individual compute instructions
  2. A mechanism to specify a priority of labels
  3. A fallback behavior for files with multiple non-prioritized compute instructions, i.e. none of the labels appears in a priority specification.

Labelling individual compute instructions

datalad make receives an option to label compute instructions. The compute instruction will be labeled with the template name if the option is not used. Labels will be stored in the compute instructions, currently in the associated datalad remake-URL

Mechanism to specify priority

Label priorities should be set on a per user basis. That means they should be read from the user environment, e.g. from global or local git configurations.

Optionally the dataset could contain a default prioritization list, e.g. in .datalad/make/priorities

Fallback behavior

If a file has multiple compute instructions and none has a prioritized label, the system will randomly choose one of the available instructions.

Corner cases

Case 1: repeated prioritized label

Select one compute instruction randomly.

Case 2: only unprioritized labels

Select one compute instruction randomly.

christian-monch commented 1 week ago

Implemented in PR #52