automl / amltk

A build-it-yourself AutoML Framework
https://automl.github.io/amltk/
BSD 3-Clause "New" or "Revised" License
62 stars 4 forks source link

[Feature] Generic input/output pipeline components #199

Open eddiebergman opened 9 months ago

eddiebergman commented 9 months ago

It could be useful to consider pipelines where there is no concrete "thing" to build, piecing together all the components, as is done with the sklearn_builder. The reason this is not implemented is that for a given component, we have no idea how to pass something in to it and get something out of it, to pass to the next step. This is solved by sklearn where they assume a fit and transform/predict which lets them chain components together.

The current "workaround" is for users to define their own builder= to Node.build() which pieces together some object they would like, yet this is likely to complicated to expect of a new user for even a simple linear sequence of components.


For a more general pipeline structure, we would likely need our own custom Pipeline, along with a user specification of how to handle the input/output problem.

Some possible approaches for this problem:

Sequential(
    Component(A),
    Pipe(lambda prv_output, a: a.do_something(prv_output),
    Component(B),
    Pipe(lambda prv_output, b: b.do_something_else(prv_output))
)
Sequential(
    Component(
        A,
        operation=lambda initial_data, a: a.do_something(initial_data),
    ),
    Component(
        B,
        operation=lambda prev_output, b: b.do_something(prev_output)
    ),
)