Should output module always run?

ninjapapa commented 5 years ago

Currently output module re-run the same way as other modules, so basically if there is a persisted data with the same hash-of-hash, it will NOT rerun. However, it may not be the desired behavior since user may somehow deleted the output file/table, and expect rerun the output module will recover it.

AliTajeldin commented 5 years ago

In old "output" module paradigm, modules were not "run" in the sense that the DF is recomputed but re-run to publish the output. Now that "output" modules are pure publish, we should re-run them every time. As a user, that is what I would expect as there is no way to compare the publish result to the current output to determine if we need to re-publish

ninjapapa commented 5 years ago

@AliTajeldin make sense.

ninjapapa commented 5 years ago

Original idea

Put the real write operation into a "post_run" method. Later figured out that can just put it in _post_action method, since it always after run method, and will always be called even ephemeral.

However with more thought, it is not ideal:

The write operation itself is an action, put it in post action is not literarily correct especially when output is ephemeral
It make the obvious stuff hide to deep in the running logic

Will do the following

Explicit Approach

Current entry point to module from the running is _get_data. Need to make that entry point _do_it. Then _do_it can call _get_data for regular modules, for output module, _do_it just call doRun directly, and still put the write operation in doRun.

Within output's _do_it, will call _run_ancestor_and_me_postAction since the write operation guarantees an action.

TresAmigosSD / SMV

Should output module always run? #1551

Original idea

Explicit Approach