Open geoffjentry opened 7 years ago
One approach would be to set an environment variable describing the index in the task array. This variable would be available when evaluating Executor.cmd
FYI that we're in the processing of adding exactly this functionality into our dsub command-line, setting an env var for the task index. We're doing it client side, since our server impl doesn't support it server side.
Ideas...
Roughly:
tpl = {
"executors": [
{ "cmd": ["echo", "$TASK_INDEX"]},
],
"inputs": [{
"path": "/path/to/storage",
}],
}
tes.CreateTaskBatch(tpl, repeat=1000)
Roughly:
tpl = {
"executors": [
{ "cmd": ["echo", "{% TASK_INDEX %}"]},
],
"inputs": [{
"path": "/path/to/storage/{% TASK_INDEX %}",
}],
}
tes.CreateTaskBatch(tpl, repeat=1000)
Roughly:
tpl = {
"executors": [
{ "cmd": ["echo", "{% DRUG_NAME %}"]},
],
"inputs": [{
"path": "/path/to/storage/{% DRUG_NAME %}",
}],
}
tes.CreateTaskBatch(tpl, vars=[
{"DRUG_NAME": "foo"},
{"DRUG_NAME": "bar"},
...thousands of rows here...
])
tpl = {
"executors": [
{ "environ": { "shared": "foo" } },
],
"resources": {
"cpus": 10,
},
"inputs": [
{
"path": "/container/path",
},
{
"path": "/container/path",
},
],
}
tes.CreateTaskBatch(tpl, partials=[
... these are partial task messages, each row defining a specific override
{
"executors": [
{ "cmd": ["echo", "task1"] },
],
"inputs": [
{ "url": "/path/to/task1/input1.data" },
{ "url": "/path/to/task1/input2.data" },
]
},
... thousands of rows here ...
])
This is a duplicate of #55 right? If so, I recommend we close #55
There was a discussion on the mailing list about providing task array functionality, which is a common feature among job schedulers