Closed davibicudo closed 3 years ago
I would propose adding another stage that basically only makes the condition, like:
def config(context):
if context.config("run2"):
context.stage("process1", alias = "process")
else:
context.stage("process2", alias = "process")
And then later
def execute(context):
return context.stage("process")
This could either be independent or directly in write_data
Thanks for the suggestion!
This doesn't exclude the need to set a flag in the config, which was what I initially wanted to avoid, since in theory the "task-list" is whatever is included in run
, and config
the place for parameters, but at least avoids passing through a stage that is not needed.
To remove the flag altogether, and borrowing from your solution, I was thinking about something like this:
def config(context):
context.stage(["process1", "process2"], alias = "process")
And the pipeline resolves this following the logic:
"Depend this stage on last stage of the list, which is required in the config. Default on the first element."
The idea would be to make write_data actually dependant on arbitrary stages, depending on what is expressed in run
.
I'm not sure though if the additional complexity is worth it, but it shouldn't be too difficult to implement. What do you think?
I think it adds to much of "auto-magic" for a very specific issue. Probably a better way would be to provide the list of requested stages through the context. Then you could do something like
if "process1" in context.requested_stages:
and define your own selection logic
Well that solves my problem :)
Wait, I meant it would be good to have that, I think it is not there yet ;)
Oh, sorry for closing then, I thought this was solved since it is available here:
class ConfigurationContext:
def __init__(self, base_config):
self.base_config = base_config
self.required_config = {}
self.required_stages = []
self.aliases = {}
self.ephemeral_mask = []
And is a solution to the case I proposed, i.e. reworking your previous solution a bit:
def config(context):
if "process2" in context.requested_stages:
context.stage("process2"], alias = "process")
else:
context.stage("process1"], alias = "process")
Yes, the variable is there, but it collects the calls to context.stage("...")
to process it further later. It does not provide the list of requested stages by the user. We have to add a new functionality there, but be careful not to be too ambiguous :)
Ok, now I see, requested_stages
!= required_stages
:)
So it could be added here:
by passing definitions
to the constructor.
As for the name, requested_stages
sounds good, or perhaps global_requested_stages
or config_requested_stages
...?
I have an upcoming PR with a few improvements in the DecoratedStage
and if you agree with this proposed solution I could include this there as well.
config_requested_stages sounds good, yes, sure!
Hi @sebhoerl
Considering a pipeline with stages like:
LOAD_DATA -> PROCESS1 -> (PROCESS2) -> WRITE_DATA
what would be the best way to handle
process2
?write_data
is dependant onprocess1
orprocess2
(both).One way to solve it could be setting the chain as if all are mandatory, but passing a flag to
process2
namedrun_process2
, which is defined in the config. This is not very elegant, since ideally we could simply addprocess2
in therun
-list. Another option is combiningprocess1
andprocess2
, but this also requires some flag and if each is a long-running complex stage, this is also not ideal.Do you have another suggestion on how to deal with this issue?