Open mck-star-yar opened 1 month ago
I can see the value of this, but this is a very difficult problem, because :
Pipeline
and a PipelineML
is not a PipelineML
in generalPipelineML
and a PipelineML
is not a PipelineML
in generalmostly because summing / filtering pipelines breaks the assumption of a single input, and all inputs of inference are outputs of training. You need to know exactly what you are doing to ensure your pipeline is indeed a PipelineML.
I could eventually let summation happen, and have an error message in case of invalid output, but I am afraid this will raise even more questions...
Yeah, that makes perfect sense. I think we should leave it as it is. And you could mention this discussion in the code itself to verbalize this design decision.
I've palyed a little bit with it and it's possible to add one PipelineML
with a Pipeline
, but it dos not make sense to add two PipelineML
altogether (we can add the training, but what about the inference.
As discussed, I'll just document the decision
Description
This feature would simplify pipeline ensembling a lot if PipelineML would implement
__sum__
and other dunder methods.Here's a simplified example. We have
ds
pipeline defined asdata_prep
+training
+report_training
, where actual model training happens in thetraining
. All four are exposed in the model registry. So users can trigger training pipelines by calling eitherds
ortraining
. Given I can't append the result ofpipeline_ml_factory
to defineds
, I should wrap bothds
andtraining
with ml wrapper:Current code:
Desired code:
Of course the alternative here is to extract this piece of code and do wrapping for each of the functions that contain training step; but that'd require manual tracking each time the pipeline is updated opposed to having only training step wrapped into this new class.
I can see that this is the behavior by design. Is there any specific reason why this limitation is imposed?
Possible Implementation
Implement dunder methods of
PipelineML