ben-denham / labtech

Easily run experiment permutations with multi-processing and caching
https://ben-denham.github.io/labtech/
GNU General Public License v3.0
7 stars 1 forks source link

Feature: use inherited attributes for `@labtech.task`-decorated tasks. #3

Closed nathanjmcdougall closed 5 months ago

nathanjmcdougall commented 5 months ago

I have come across a pattern I don't like when using the decorator @labtech.task:

@labtech.task
class Experiment:
    base: int
    power: int

    def run(self):
        result = 42 + really_complicated_function_defined_elsewhere(self)
        return result

def really_complicated_function_defined_elsewhere(experiment: Experiment) -> float: ...

I don't like how there's a cyclic relationship between the class and the external function definition. This would be a different way to do things, which would be nice IMO:

class ExperimentConfig:
    base: int
    power: int

@labtech.task
class Experiment(ExperimentConfig):
    def run(self):
        result = 42 + really_complicated_function_defined_elsewhere(self)
        return result

def really_complicated_function_defined_elsewhere(experiment: ExperimentConfig) -> float: ...

Unfortunately, it doesn't work currently (inherited attributes aren't detected):

TypeError: FittingExperiment.__init__() got an unexpected keyword argument 'sample_size'"
ben-denham commented 5 months ago

Thanks @nathanjmcdougall,

The approach you've suggested of inheriting parameters from a non-task class probably can't be made to work unless labtech were to stop using dataclasses to manage parameters/attributes.

However, here are a couple of other approaches that I think would achieve your goal of avoiding the cyclic relationship:

Approach 1: Make the base class a task type

By making the base class a task type, its attributes will be treated as parameters and inherited by the child class. In this particular example, I've also made the base class an abstract class to make it very clear that it is not a complete task type that can be run on its own:

from abc import ABC, abstractmethod

@labtech.task
class ExperimentConfig(ABC):
    base: int
    power: int

    @abstractmethod
    def run(self):
        pass

@labtech.task
class Experiment(ExperimentConfig):
    def run(self):
        result = 42 + really_complicated_function_defined_elsewhere(self)
        return result

Approach 2: Make the experiment config a dependency task

You could make the Experiment take the ExperimentConfig as a parameter, where ExperimentConfig is itself a task (i.e. the Experiment task depends on an ExperimentConfig task.

The same result could also be achieved by just making the config parameter to Experiment a simple dictionary, but any custom class that is a parameter must be a task (which is a restriction to support serializing any parameter value for the cache).

In this example, I've also configured ExperimentConfig objects to not be cached, as there is no benefit to caching the result that is just itself:

@labtech.task(cache=None)
class ExperimentConfig:
    base: int
    power: int

    def run(self):
        return self

@labtech.task
class Experiment:
    config: ExperimentConfig

    def run(self):
        config = self.config.result
        result = 42 + really_complicated_function_defined_elsewhere(config)
        return result
nathanjmcdougall commented 5 months ago

Thanks Ben, that makes sense. I think approach 1 is my preference.