QuantEcon / sphinxcontrib-jupyter

A Sphinx Extension for Generating Jupyter Notebooks
BSD 3-Clause "New" or "Revised" License
74 stars 23 forks source link

FEAT: Support for general static asset source locations #268

Open mmcky opened 4 years ago

mmcky commented 4 years ago

This PR allows for more flexible source files to be specified in the RST documents and will organise them based on type in the BUILD directory. All images will be located in BUILDDIR/_images and all downloads will be placed in BUILDDIR/_downloads. Only the assets referenced in the source RST files will be copied to the build location.

mmcky commented 4 years ago

@AakashGfude currently _downloads is taken by download notebook option. We should update this to use _downloads/ipynb/ with all download directive files going into the root level of _downloads?

mmcky commented 4 years ago

Note: The download notebooks won't visually work until the s3 location hosts images in _images.

mmcky commented 4 years ago

@AakashGfude for some reason this branch is no longer placing build files in _build/jupyter but rather in the root level _build. I haven't changed self.outdir in the builder so I don't know why this would have changed. The master branch still works. Any ideas? Solved this in https://github.com/QuantEcon/sphinxcontrib-jupyter/pull/268/commits/31c3eba764f3c0732151e475cfe3e9073b03c0b0

mmcky commented 4 years ago
3ndp-1.pdf           countries-1.csv     employ-1.csv   iteration_notes-1.pdf  realwage-1.csv  ticker_data-1.csv
aiyagari_obit-1.pdf  course_notes-1.pdf  firenze-5.pdf  pi2-7.pdf              test_pwt-1.csv  wb_download-1.py
mmcky commented 4 years ago

Julia:

need to resolve the issue of nested folders now as uri are returned from sphinx as:

'getting_started_julia/../_static/figures/julia_term_1.png'
mmcky commented 4 years ago
mmcky commented 4 years ago

just contains theme support files now.

mmcky commented 4 years ago
mmcky commented 4 years ago
mmcky commented 4 years ago

This PR migrates the static assets to use a flat collection of static objects.

  1. all image and figure directive support files are placed in _images
  2. all download directive support files are placed in _downloads

If there are filename collisions from files that are in different source locations then this is caught and a filenumber is incremented to the object at the time of writing links in the translator. The full collection of objects are then copied as a final process in the builder.

It will handle a variety of file locations including:

  1. root specification: /_static/file.pdf
  2. relative file locations: _static/file.pdf or ../_static/file.pdf. In this case the extension tracks external and internal file locations which means the _static folder doesn't have to be contained within the rst folder.

This is not optimal behaviour for the use case of writing lecture collections (where the author would like to distribute a notebook (with files) without having web server support) but works well for managing large websites and complete notebooks collections. With web support download notebooks can be used to distribute an individual notebook and the _images and downloads are supported via a s3 or web server host.

mmcky commented 4 years ago

tried to test on the latest master today but the compilation stalled -- perhaps on avalanche.rst again?

mmcky commented 4 years ago

@arnavs @sglyon If you had time would you be willing to run the jl and ds projects on this branch of the extension and let me know if you spot any errors. It basically changes how all statics are stored and linked so is a big change worth testing widely. I will do my own testing as well.

mmcky commented 4 years ago
sglyon commented 4 years ago

it looks to me like this might have broken parts of the ds lectures

we have this in the rst:

.. figure:: https://storage.googleapis.com/ds4e/_static/reshape_files/melt.gif
   :alt: melt.gif

And this is what ends up in the html output:

<img src="../https://storage.googleapis.com/ds4e/_static/reshape_files/melt.gif" alt="../https://storage.googleapis.com/ds4e/_static/reshape_files/melt.gif">

Notice the leading ../ in the img src

sglyon commented 4 years ago

Just checked more lectures and all our remote images received a leading .. and are now not showing up

mmcky commented 4 years ago

thanks @sglyon I will work on a fix for this context.

mmcky commented 4 years ago

@sglyon I have issued a small fix for the http issues you are having which checks for external links when parsing adjustment for absolute links in nested folder case. I think that should now be fixed with the latest version

mmcky commented 4 years ago

@sglyon @arnavs would you have time to review this with the latest adjustment to see if they work for datascience and julia?

mmcky commented 4 years ago

@AakashGfude if you had any time I would appreciate a review of this approach and implementation. This works well for our projects but not so well for other use cases like writing a lecture series with local static assets.

I am inclined to give this approach a go as it will solve the julia nested folder issues we currently have adjustments for _static support. With the knowledge we may change the implementation in November.

mmcky commented 4 years ago
mmcky commented 4 years ago

thanks for testing @sglyon -- this is on hold for a week as we get pdf feature merged. Then we will revisit this PR.

mmcky commented 4 years ago

Testing

mmcky commented 4 years ago
Exception occurred:
  File "/home/qebuild/repos/sphinxcontrib-jupyter/sphinxcontrib/jupyter/writers/translate_all.py", line 206, in visit_image
    elif uri not in self.builder.image_library.keys():
AttributeError: 'JupyterPDFBuilder' object has no attribute 'image_library'
mmcky commented 4 years ago

@sglyon @arnavs @AakashGfude -- just wanted to check in now that pdf has been merged on this planned change to how static assets are managed. Before I go and fix the JupyterPDFBuilder with the new _image and _downloads flat file library apporach I thought I would open this to discussion to see if there were any thoughts / comments / improvement ideas on this implementation or approach. I am happy to do the code work but would love to hear if anyone has comments before we make the change.

This is subotimal in one usage case -- which is distributing notebooks (along with static) assets. For example someone may want a notebook and some statics to go along with it when teaching a class. This change would make it hard to distribute a notebook and any static assets as they are embedded in the _image folder rather than being in a local static folder for example.

AakashGfude commented 4 years ago

@mmcky no ideas as such popping out now, the implementation looks sleek. Regarding the distribution of assets. Maybe in the later stage maybe we can have a parent directory which has all the folders like _image, _downloads and _static, but then we will have to prepend all the links to have that parent subdirectory, which can be handled by the plugin or passed as a static path to conf. Having a single entry directory for all of these static assests will definitely be good in the long run.

jlperla commented 4 years ago

just wanted to check in now that pdf has been merged on this planned change to how static assets are managed. Before I go and fix the JupyterPDFBuilder with the new _image and _downloads flat file library apporach I thought I would open this to discussion to see if there were any thoughts / comments / improvement ideas on this implementation or approach. I am happy to do the code work but would love to hear if anyone has comments before we make the change.

@mmcky I am not sure I understand the issues here and whether this has any practical effects. Just to be clear, when you talk about notebooks are you talking about the intermediate juptyer format or the final notebooks being spit out for the github repo?

So, assuming we don't want to have any static assets deployed (outside of the stuff we used to discuss of putting the Manifest/Project/etc. in the same directory structure for execution) does this effect us?

mmcky commented 4 years ago

thanks @jlperla for your comments. The primary issue is when julia went over to use nested folders -- the way sphinx parses uri(i.e. link references) became broken. They aren't generated correctly for the nested structure. When we had julia in the jl subfolder this was a big issue as we couldn't make use of the root location and relative links were broken so we put a patch in place that copies _static to every folder. That is how julia is currently served if you look at image links etc.

To be clear there are a lot of different notebook types now. The use cases for our lecture sites is pretty much covered:

  1. html - notebook used for generating the html (emphasis on html use in intermediate ipynb)
  2. pdf - notebooks used for generating the pdf (emphais on pdf / latex compatibility in intermediate ipynb)
  3. coverage - notebooks used in coverage testing (includes unittests)
  4. download - notebooks used to support downloads and are self contained with supporting we locations for images etc.

there is another

  1. jupyter - used primarily for local editing (without execution) and we are placing an emphasis or generating readable notebooks (with as much markdown as possible). This is being used by some lecturers to generate course notes etc. from rst and once feature request is to have local _static objects so if you have a folder lect1 the static assets for those lectures are contained in lect1 rather than some global statics folder. This PR doesn't work particularly well in that use case. In november when we do major rewrite of the extension I think we will support this request.

This PR mimics the standard sphinx hmtl writer and collects only used assets into a image and downloads folder. One benefit is only objects called in RST are copied so legacy static objects aren't hosted and removes complexity around nested folder structures etc.

jlperla commented 4 years ago

This all sounds like a reasonable iteration on the design (although I don't need to follow it all!)

Does this have a concrete impact on the julia and datascience lectures, how we write the RST, changing what we currently deploy, etc? Or are you mostly just giving us a heads up that large changes are afoot?

mmcky commented 4 years ago

hey @jlperla no -- all of these changes are internal to the extension. No changes to RST required -- just disussing this as it changes how static assets are managed and keen to see if there are better ideas on the change.

It will change config -- (i.e. web links for images etc.) -- but that will need to managed if this get's merged.

jlperla commented 4 years ago

Sounds great to me! If I think of anything, I will mention it. Though I don't have a mental model of sphinx, so it is unlikley.

mmcky commented 4 years ago

Note: Placing this on hold as we evaluate the extension in November for rewrite. This will be included at that time.

mmcky commented 4 years ago

this PR provides one implementation idea for managing static assets. This PR will not be merged as the underlying extension has significantly changed. But this PR contains some interesting ideas.