fastai / nbdev

Create delightful software with Jupyter Notebooks
https://nbdev.fast.ai/
Apache License 2.0
4.8k stars 484 forks source link

Introduces `#|expatch` directive #1336

Closed yegeniy closed 8 months ago

yegeniy commented 1 year ago

Documentation is in nbs/explanations/directives.ipynb. Implementation is spread across nbs/apis/0{1,4}*.

Description:

This pull request proposes a new directive called #|expatch. It is similar to #|export, but it will also indent the content of the export cell by two spaces. This allows us to interleave multiple cells while defining certain methods.

Use Case:

This directive has been tested in production for a couple of months to develop ETL style PySpark jobs.

Due to how the pyspark jobs are deployed, we end up needing all of the code wrapped in a method annotated with @call_parse. Using #|expatch has been helpful, because the code in the method is linear. It involves transforming complex, messy, and disparate sets of data. We interweave exported (#|expatch) code cells with documentation and example cells. But we also introduce documentation and examples along the way. There is plenty of code refactoring, helper methods, and code reuse. But at the end of the day, the bulk of the business logic is best kept inside a single exported method.

It has proven much easier and faster to debug and iterate this way. The alternative being that we declare the whole of the primary method inside a single hundred-line-plus cell and then test by running the whole thing or by reproducing parts of it in non-exported cells. With #|expatch, we can declare the exported code once and run it as we develop and test it.

A Caveat on Usage:

Please note that the #|expatch directive has very limited utility where it actually makes sense to refactor the code into small methods. It is not a good fit for those cases. But in that narrow set of use-cases where a long drawn-out method makes sense, this directive significantly simplifies documentation and debugging. In design pattern terminology, this is sometimes referred to as a "Transaction Script". Not to be confused with the much more common anti-pattern called "Long Method".

review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

yegeniy commented 1 year ago

There is also a small thread in the nbdev discord about this idea

hugetim commented 8 months ago

Can you link to a sample notebook that uses this directive? It's not clear to me how it would work when running cells in the notebook. Does the code run cleanly when running notebook cells? (I'm puzzled how it could because code in a cell separated from the start of the method wouldn't have access to the variables created in the first method cell, within the method's namespace.)

If this directive were adopted, I'm thinking there would need to be an option to indent by four spaces rather than two, no?

yegeniy commented 8 months ago

If this directive were adopted, I'm thinking there would need to be an option to indent by four spaces rather than two, no?

That would be a really nice improvement to the expatch directive!

There is already an indent=2 argument to the _expatch_ method, which defaults to 2 spaces. It would just be a matter of passing your desired indentation in to the directive processor. I recommend providing up to 3 ways of doing it:

yegeniy commented 8 months ago

Can you link to a sample notebook that uses this directive? It's not clear to me how it would work when running cells in the notebook.

I agree, it's a little nuanced. I attached an example in this comment. But first, please note the "Caveat on Usage":

Please note that the #|expatch directive has very limited utility where it actually makes sense to refactor the code into small methods. It is not a good fit for those cases. But in that narrow set of use-cases where a long drawn-out method makes sense, this directive significantly simplifies documentation and debugging. In design pattern terminology, this is sometimes referred to as a "Transaction Script". Not to be confused with the much more common anti-pattern called "Long Method".

In the discord chat thread, I included a couple of screenshots to demonstrate this: (click to expand) ![The notebook](https://cdn.discordapp.com/attachments/1083213136070463558/1086137419629342720/image.png?ex=653b364c&is=6528c14c&hm=faf67dfbda333d82f1f6fbbf45e168588bd62ff350401c84cd20e1d87e384544&) ⚠️ take particular note of the second cell, which is not exported. It is just there to set local testing values for the arguments. ![The exported code](https://cdn.discordapp.com/attachments/1083213136070463558/1086137419889397841/image.png?ex=653b364c&is=6528c14c&hm=a1f9495e35d1343c4958b61a00f0e1bcce658a581452b8ac83e655c36482c674&)
yegeniy commented 8 months ago

Does the code run cleanly when running notebook cells? (I'm puzzled how it could because code in a cell separated from the start of the method wouldn't have access to the variables created in the first method cell, within the method's namespace.)

Yes, it executes cleanly, but only if is a specific pattern is followed first. Mainly, the arguments to the method need to be declared in a separate cell once the method is declared. This is useful for demonstrating your code.

Instead of writing your complete method in single cell, you would only include a stub in the #|exported cell. Basically, the method signature and documentation. The rest of the method is #|expatched in.

hugetim commented 8 months ago

...narrow set of use-cases...

Thank you. Your responses help me understand better how it works for you in practice. My own opinion (just the opinion of one other nbdev user), having considered it a bit more, is that this would be better as an nbdev plugin rather than as part of nbdev proper. The reason is that it introduces 1-2 new directives and/or a new setting for the sake of a fairly narrow set of use cases.

...limited utility where it actually makes sense to refactor the code into small methods. It is not a good fit for those cases.

There's also a risk of people using this directive in situations for which it is not a good fit rather than refactoring their code into smaller functions, inadvertently encouraging worse code. Providing it instead as a plugin reduces this risk, I think.

jph00 commented 8 months ago

I apologise for sitting on this for so long - I've had trouble getting my head around what it's for, and this discussion is very useful thank you! I agree that it would be best made available as a plugin in a separate project, so I'll close this PR.

yegeniy commented 8 months ago

Brilliant. Thanks for the suggestion @hugetim. I am not sure how I missed the plugin support in nbdev, but I am grateful for the link.

I believe this should be possible using processors. But it may not be, we’ll see. On the surface it seems like a better fit for any custom directive, for sure though.

On Fri, Oct 13, 2023 at 17:46 Jeremy Howard @.***> wrote:

Closed #1336 https://github.com/fastai/nbdev/pull/1336.

— Reply to this email directly, view it on GitHub https://github.com/fastai/nbdev/pull/1336#event-10649198839, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACHPQS76C4MRHMVR37EHR3X7GZDDANCNFSM6AAAAAAXZFM35Y . You are receiving this because you authored the thread.Message ID: @.***>