girder / girder_jupyter

A Girder content manager for Jupyter
BSD 3-Clause "New" or "Revised" License
8 stars 4 forks source link

Convenience methods or something like Python's "open" for accessing Girder files #24

Open jeffbaumes opened 6 years ago

jeffbaumes commented 6 years ago

Assuming we have #23, it would be great to add a few convenience methods which also know the current notebook's path inside Girder in order to read/write Girder files. Some options:

  1. A girder_jupyter.open function would return something duck typed to work identically to python files. The path would be relative to where the notebook is saved in Girder.
from girder_jupyter import open

with open('relative/path/to/file.csv') as f:
    # ...

with open('relative/path/to/new_file.csv', 'w') as f:
    # ...
  1. Because the prior idea may be tricky/impossible, some basic I/O operations like this could be helpful:
content = girder_jupyter.read('relative/path/to/file.csv')

# ...

girder_jupyter.save(new_content, 'relative/path/to/new_file.csv')
  1. We go a whole other route and mount the filesystem locally with Girder's FUSE support, but this is currently read-only and may be somewhat (very?) involved to make it a writable filesystem. In this case, everything would work, including 3rd party use of Python's standard open. In this case, it would seem that the contents manager is not even needed. Thoughts @manthey?
jeffbaumes commented 6 years ago

@manthey @aashish24 @cjh1 @cryos @zachmullen, this is for later discussion on jupyter-girder strategy. This notebook includes a proof-of-concept implementation similar to (1) above: a nb_open function that acts kind of like Python's open, supporting r, rb, w, wb modes. Opening in read mode loads the entire file immediately, and writing will only write to Girder when the file is closed (i.e. the with context ends).

@cryos had the idea of accessing data from a general jupyter ContentsManager. This is an implementation of that concept, which is interesting in that there is nothing about it that requires a Girder ContentsManager. It works for any ContentsManager including the built-in filesystem based one. @cjh1 is the one who figured out how to get access to the jupyter session token inside the notebook to make this work.

The notebook shows (1) reading, displaying, then writing back a copy of an image stored in Girder, (2) reading a CSV file stored in Girder into a pandas dataframe, (3) showing an error when a file is not found.

Limitiations include (1) it only works with absolute Girder paths, not paths relative to the notebook file as you might expect, (2) if there are multiple jupyter labs running on the same machine it does not know which one to get content from (arbitrarily chooses the first one), (3) you currently need to use the contextmanager with statement, it does not support f = open(...) with a explicit f.close() like Python's open does.