breathe / NotebookScripter

Expose ipython jupyter notebooks as callbable functions or running scripts
MIT License
26 stars 1 forks source link

Documentation on parameter passing #5

Open allmedia-nz opened 4 years ago

allmedia-nz commented 4 years ago

This is a genius concept for Juypter and I think revolutionizes the whole platform. However I confess I am finding the question of the scope of subnotebooks and parameter passing a bit unclear and I feel a tad more example detail would help.

As a matter of format if the examples were Juypter cells rather than consol interactions it would be easier to relate to.

Could you also please clarify the seeming contradictions: "execs all the code cell's within Example.ipynb sequentially in the context of that module" and "Importantly - the notebook code is not imported as a python module - rather, all the code within the notebook is re-run on each call to run_notebook()"

What I find confusing is scope. If notebook A calls notebook B does B inherit all variables from A as if it was a continuation of A (just an automatically loaded set of cells with input and output suppressed) or does it require parameters like a function? You hint at this by distinguishing run_notebook_in_process() from run_notebook() but once again its not abundantly clear.

Can Notebook B be used in a loop with changing input values?

In my tests I have been trying to pass a pandas dataframe. It's been very perplexing trying to work out where to declare variables in A or B so that they get recognised, or whether a keyword argument is a variable or what?

The term "keyword arguments" in "If desired, values can be injected into the notebook for use during notebook execution by passing keyword arguments" is not clear. Are we talking about variable values? In which case are they limited to strings? or is there no restriction on type?

I am somewhat surprised by the relatively low level of following you have for this initiative given how useful it could be, and I suspect some of that may be because the documentation needs a bit more work.

Happy to help if that would be useful. I'm more of a writer than a programmer anyway.

kind regards peter

breathe commented 4 years ago

Hi @allmedia-nz thanks for the questions and apologies for the long delay in response! I really struggle with tracking github notifications through the notification noise I'm currently buried in ...

I can see how the documentation is unclear ... I would definitely appreciate a PR to improve the language/explanatory power of the examples ...!

What run_notebook(...) tries to do is 'imagine the notebook is a function defined with def with keyword parameters given by the calls to receive_parameter that occur in the notebook ...'

notebook_a.py

x = receive_parameter(some_param1="awesome")
print(x)

run_notebook("./notebook_a.py", some_param1="Functions are awesome") tries to treat notebook_a.py as if it was a function defined like this:

def notebook_a(some_param1="awesome"):
   some_param1 = deep_copy(some_param1)
   print(x)

In which case the call to run_notebook("./notebook_a.py", some_param1="Functions are awesome") tries to behave the same way notebook_a(some_param1="Functions are awesome" would behave ...

What I find confusing is scope. If notebook A calls notebook B does B inherit all variables from A as if it was a continuation of A (just an automatically loaded set of cells with input and output suppressed)

No -- Notebook B won't inherit or know anything about the state of any values in Notebook A aside from the values passed to B by A (with a little bit of a caveat ...)

or does it require parameters like a function?

yes -- Notebook A should pass parameters to Notebook B.

There is a small bit of extra behavior which significantly complicates explaining things ... -- but its actually a very simple mechanism.

NotebookScripter maintains a stack of values passed into a call to run_notebook

...
run_notebook("notebook_a", param1="foo", param2="will_be_shadowed")
...

in notebook_a

...
run_notebook("notebook_b", param2="bar")
...

in notebook_b

receive_parameter(param1=None)
receive_parameter(param2=None)
print(param1, param2)

Outputs: "foo, bar"

If we were to put a breakpoint at the print statement and examine the notebook scripter callstack -- it would look like this:

notebook_b: print(param1, param2) [param1= "foo", param2="bar"] notebook_a: run_notebook(param2="bar") [param1="foo", param2="will_be_shadowed"] console: run_notebook(param1="foo param2="will_be_shadowed") []

(The things in [] above are values of the notebook scripter parameters passed into that execution frame)

To try an clarify the behavior or print(param1, param2) in prose: param2 was passed into the call to notebook_b. No value for param1 was supplied in that call -- so run_notebook searches for a value for that parameter passed to a parent invocation of run_notebook() and finds param1="foo" that was passed into notebook_a.

The values passed to calls to run_notebook() are stored in a stack and the stack of run_notebook parameters are searched in order to find the value for the parameter that should be used within a given execution scope. Another way of saying this is that the parameters passed to run_notebook are dynamically scoped (as opposed to lexical scoping used for normal method parameters in python) ... I think the big confusion that hit you is that this mechanism applies ONLY to the parameters passed to calls to run_notebook(...) -- not at all to any other values defined in the notebook scope ...

I found this behavior to be quite convenient when developing some machine learning models ... If you think of all the receive_parameters() as defining HYPER_PARAMETERS, then you can implement pipelines that internally use run_notebook() to execute different algorithms while allowing to define 'experiments' in a 'flat way' -- simply by providing values for any receive_parameter call at a top-level...

(aside: The fact that this behavior is hard to explain in a straightforward way is perhaps an argument against this mechanism ... I originally made it possible to decide if you want this behavior or not -- but that was even harder to explain ... for my use cases at least, the lexical scoping mechanism was sufficiently useful that I ended up deciding to just make it the only behavior ... (I could be persuaded differently ...))

Can Notebook B be used in a loop with changing input values? Yes -- each execution of run_notebook(...) is conceptually the same as 'imagine the notebook is a function and call it again'

In my tests I have been trying to pass a pandas dataframe. It's been very perplexing trying to work out where to declare variables in A or B so that they get recognised, or whether a keyword argument is a variable or what? is not clear. Are we talking about variable values? In which case are they limited to strings? or is there no restriction on type?

The parameter values passed to run_notebook() calls need to be pickle serializable -- this is to support the out of process run_notebook execution models -- and I think its better to maintain consistency and require this also for the in-process execution ... I believe pandas dataframes are pickle serializable ... if you have some problematic sample code I could help take a look/identify if there is a bug somewhere ...?

Happy to help if that would be useful. I'm more of a writer than a programmer anyway.

Would happily accept improvements to the documentation ...! I haven't been doing much python programming lately but I'm not intending to leave this out to die and would be happy to see it improve!