Closed subramaniam20jan closed 4 months ago
You can use the ExecutionContext and the DataContainer to do this. You may want to take a look at the handle_fit and handle_transform methods.
I am not sure I understand how this can be used without too much confusion.
What I want:
from neuraxle.base import BaseStepseStep
from neuraxle.pipeline import Pipeline
class KerasNeuraxleWrapper(BaseStep):
def __init__(self,
model,
hyperparams: neuraxle.hyperparams.space.HyperparameterSamples = None,
hyperparams_space: neuraxle.hyperparams.space.HyperparameterSpace = None,
name: str = None,
savers: List[neuraxle.base.BaseSaver] = None,
hashers: List[neuraxle.base.BaseHasher] = None):
self.model = model
super().__init__(
hyperparams=hyperparams,
hyperparams_space=hyperparams_space,
name=name,
savers=savers,
hashers=hashers,
)
def fit(self, data_input, expected_output, **kwargs):
self.model = self.model.fit(x=data_input, y=expected_output, **kwargs)
def transform(self, data):
return self.model.transform(data)
km = KerasNeuraxleWrapper(keras_model)
pipe = Pipeline([km])
pipe.fit(input_data_generator, expected_output=None, validation_data=validation_data_generator)
The current flow I see is as follows:
pipeline.fit(data_input, expected_output) -> pipeline.fit_data_container(DACT(data_input, expected_output) --> _FittableStep.handle_fit(dact, cx) ---> pipeline._fit_data_container(dact, cx) ----> for each step: step.handle_fit(dact, cx)
Both the ExecutionContext and the DataContainer instances are generated inside the base Pipeline class. The best solution(hack) given the current setup could be to pass a dictionary as the data_input and then use that as follows.
from neuraxle.base import BaseStepseStep
from neuraxle.pipeline import Pipeline
class KerasNeuraxleWrapper(BaseStep):
def __init__(self,
model,
hyperparams: neuraxle.hyperparams.space.HyperparameterSamples = None,
hyperparams_space: neuraxle.hyperparams.space.HyperparameterSpace = None,
name: str = None,
savers: List[neuraxle.base.BaseSaver] = None,
hashers: List[neuraxle.base.BaseHasher] = None):
self.model = model
super().__init__(
hyperparams=hyperparams,
hyperparams_space=hyperparams_space,
name=name,
savers=savers,
hashers=hashers,
)
def fit(self, data_input, expected_output=None):
self.model = self.model.fit(y=expected_output, **data_input)
def transform(self, data):
return self.model.transform(data)
km = KerasNeuraxleWrapper(keras_model)
pipe = Pipeline([km])
pipe.fit(data_input={x=input_data_generator, validation_data=validation_data_generator}, expected_output=None)
But this is not an elegant solution by far.
Am I missing something here though?
You probably want to mix together what's done in these examples:
It is recommended that you override the _fit_data_container
method and not the fit
method for your use case. Refer to this for overriding this method, and the same one equivalent for transforming as well:
Note that you will later probably need a saver to make your pipeline serializable. Here is some inspiration on how to do it properly:
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs in the next 180 days. Thank you for your contributions.
The problem: Currently the neuraxle BaseStep has a fit method signature with only 2 parameters (data_inputs, expected_outputs). In libraries like keras it is possible to have additional arguments being passed to the fit method. This could be things like validation generators if the main data_inputs is a data generator as well.
This means if we want to wrap a keras model that takes two data generators, in a subclass of BaseStep, then it wouldnt be a straight forward implementation.
Solution: It would be extremely useful if an additional **kwargs is added to the base step fit method(in one or more of the Mixin classes) to enable passing arbitrary arguments to the custom estimator implementations.