eqasim-org / synpp

Synthetic population pipeline code for eqasim
http://www.eqasim.org
MIT License
18 stars 12 forks source link

Optional intermediary stages #61

Closed davibicudo closed 3 years ago

davibicudo commented 3 years ago

Hi @sebhoerl

Considering a pipeline with stages like:

LOAD_DATA -> PROCESS1 -> (PROCESS2) -> WRITE_DATA

what would be the best way to handle process2? write_data is dependant on process1 or process2 (both).

One way to solve it could be setting the chain as if all are mandatory, but passing a flag to process2 named run_process2, which is defined in the config. This is not very elegant, since ideally we could simply add process2 in the run-list. Another option is combining process1 and process2, but this also requires some flag and if each is a long-running complex stage, this is also not ideal.

Do you have another suggestion on how to deal with this issue?

sebhoerl commented 3 years ago

I would propose adding another stage that basically only makes the condition, like:

def config(context):
  if context.config("run2"):
    context.stage("process1", alias = "process")
  else:
    context.stage("process2", alias = "process")

And then later

def execute(context):
  return context.stage("process")

This could either be independent or directly in write_data

davibicudo commented 3 years ago

Thanks for the suggestion! This doesn't exclude the need to set a flag in the config, which was what I initially wanted to avoid, since in theory the "task-list" is whatever is included in run, and config the place for parameters, but at least avoids passing through a stage that is not needed.

To remove the flag altogether, and borrowing from your solution, I was thinking about something like this:

def config(context):
  context.stage(["process1", "process2"], alias = "process")

And the pipeline resolves this following the logic: "Depend this stage on last stage of the list, which is required in the config. Default on the first element." The idea would be to make write_data actually dependant on arbitrary stages, depending on what is expressed in run.

I'm not sure though if the additional complexity is worth it, but it shouldn't be too difficult to implement. What do you think?

sebhoerl commented 3 years ago

I think it adds to much of "auto-magic" for a very specific issue. Probably a better way would be to provide the list of requested stages through the context. Then you could do something like if "process1" in context.requested_stages: and define your own selection logic

davibicudo commented 3 years ago

Well that solves my problem :)

sebhoerl commented 3 years ago

Wait, I meant it would be good to have that, I think it is not there yet ;)

davibicudo commented 3 years ago

Oh, sorry for closing then, I thought this was solved since it is available here:

class ConfigurationContext:
    def __init__(self, base_config):
        self.base_config = base_config

        self.required_config = {}

        self.required_stages = []
        self.aliases = {}

        self.ephemeral_mask = []

And is a solution to the case I proposed, i.e. reworking your previous solution a bit:

def config(context):
  if "process2" in context.requested_stages:
    context.stage("process2"], alias = "process")
  else:
    context.stage("process1"], alias = "process")
sebhoerl commented 3 years ago

Yes, the variable is there, but it collects the calls to context.stage("...") to process it further later. It does not provide the list of requested stages by the user. We have to add a new functionality there, but be careful not to be too ambiguous :)

davibicudo commented 3 years ago

Ok, now I see, requested_stages != required_stages :)

So it could be added here:

https://github.com/eqasim-org/synpp/blob/534cb00d8590cabfcb4b3288ff78d784a4840242/src/synpp/pipeline.py#L344

by passing definitions to the constructor.

As for the name, requested_stages sounds good, or perhaps global_requested_stages or config_requested_stages...?

I have an upcoming PR with a few improvements in the DecoratedStage and if you agree with this proposed solution I could include this there as well.

sebhoerl commented 3 years ago

config_requested_stages sounds good, yes, sure!