danielfrg / s3contents

Jupyter Notebooks in S3 - Jupyter Contents Manager implementation
Apache License 2.0
248 stars 88 forks source link

add file save hook support #97

Closed wseaton closed 4 years ago

wseaton commented 4 years ago

This fixes #70 and #65, basically just ported over the FileSystemContentsManager implementations from upstream notebook.

Tested the following hooks locally with minio:

def scrub_output_pre_save(model, **kwargs):
    """scrub output before saving notebooks"""
    # only run on notebooks
    if model['type'] != 'notebook':
        return
    # only run on nbformat v4
    if model['content']['nbformat'] != 4:
        return

    for cell in model['content']['cells']:
        if cell['cell_type'] != 'code':
            continue
        cell['outputs'] = []
        cell['execution_count'] = None

c.S3ContentsManager.pre_save_hook = scrub_output_pre_save

Post save hooks are a little tricky to write, but I was able to get one to work that runs nbconvert and saves the notebook as HTML by reusing contents_manager.fs. I treated the os_path from the hook function as essentially the S3 API path:

import os
import nbformat

def make_html_post_save(model, os_path, contents_manager, **kwargs):
    """
    convert notebooks to HTML after save with nbconvert
    """
    from nbconvert import HTMLExporter

    if model['type'] != 'notebook':
        return

    content, _format = contents_manager.fs.read(os_path, format="text")
    my_notebook = nbformat.reads(content, as_version=4)

    html_exporter = HTMLExporter()
    html_exporter.template_name = 'classic'

    (body, resources) = html_exporter.from_notebook_node(my_notebook)

    base, ext = os.path.splitext(os_path)
    contents_manager.fs.write(path=(base + '.html'), content=body, format=_format)

c.S3ContentsManager.post_save_hook = make_html_post_save
wseaton commented 4 years ago

Hi @wseaton wave ! Thanks for the PR. Left a few inline comments. Do you have the bandwidth to turn your pre_save and post_save examples into tests?

Sure, I'll try to get something out this week :slightly_smiling_face:

wseaton commented 4 years ago

@ericdill ping, I added tests for the example hooks in tests/test_s3manager.py. Let me know if there's a better way to test than just saving a notebook and inspecting the result of the example functions, I think if we wanted to verify the functions were called with the proper signature we'd need to rig up something with unittest.Mock. Also do you know what's up with the linter error? Thanks!

ericdill commented 4 years ago

Linter is complaining about

./s3contents/tests/test_s3manager.py:77:1: E305 expected 2 blank lines after class or function definition, found 1
./s3contents/tests/hooks.py:4:1: E302 expected 2 blank lines, found 1

You should run make fmt before committing. I think there's also a black formatter check in there that needs to pass. Sorry to let this one die on the vine a bit. Lost track of it last week.

Running this locally works for everything:

make env
conda activate s3contents
make minio

Then in a diff terminal

conda activate s3contents
make test-all

Looks like there's a bunch of warnings, but I'm not going to lay those at your feet 😁 .

Just get that linter appeased and I can merge it (or give me commit to your branch and I can just push the commit)

wseaton commented 4 years ago

No worries @ericdill, I'll fix those two issues and add a doc section with the examples from the tests and we should be good.

wseaton commented 4 years ago

@ericdill this should be ready to go, just needs your approval on the docs changes!

ericdill commented 4 years ago

lgtm. Thanks @wseaton !