d6t / d6tflow

Python library for building highly effective data science workflows
https://d6tflow.readthedocs.io/en/latest/
MIT License
951 stars 77 forks source link

task not complete run option for data load #20

Open bacross opened 4 years ago

bacross commented 4 years ago

Would like to see an option for the data load function to automatically run if task not marked complete. I find myself writing if else statements like the below to get the desired effect:

if TaskExample().complete: df = TaskExample().output()['df'].load() else: d6tflow.run(TaskExample()) df = TaskExample().output()['df'].load()

d6tdev commented 4 years ago

How about asking the user if you want to run the task if it's not complete on load. Not sure if it should auto run.

For now suggest to always run d6tflow.run(TaskExample()) before loading any data. That will guarantee tasks are complete and no need for if statements.

bacross commented 4 years ago

just a parameter option should be enough:

TaskExample().output()['df'].load(run_if_incomplete=True)

kind of a nice to have

d6tdev commented 4 years ago

There a multiple issues that make this non-trivial to implement. 1) a target doesn't know which tasks it belongs to and 2) this could work better in Task().outputLoad(auto_run=True) but would lead to circular imports. TBD. For now suggest to always just run d6tflow.run(TaskExample()) if all tasks are complete this is fast.