betatim / notebook-as-pdf

Save Jupyter Notebooks as PDF
BSD 3-Clause "New" or "Revised" License
368 stars 72 forks source link

Embedding local images via markdown doesn't work #18

Closed rlleshi closed 3 months ago

rlleshi commented 4 years ago

Hi, very helpful extension. Thanks for your work!

There is just one problem that I have noticed. If I embed an image via HTML (Image(filename = "my_file.png")) then it will perfectly work but if I embed it as a markdown (![title](my_file.png)), then it will not be converted in the pdf file.

betatim commented 4 years ago

Do you know what the difference is to the poppy field image that is in the example notebook https://github.com/betatim/notebook-as-pdf/blob/master/example.ipynb? That one works for me, does it also work for you?

betatim commented 4 years ago

Just gave it a try, it seems like referencing local files as images is the thing that doesn't work.

betatim commented 4 years ago

Looks like there are (at least) two issues:

  1. we write the HTML to a temporary directory. This means a reference to parrots.jpg won't point to the right place any more as that path is relative to the original notebook location
  2. headless chrome prevents access to local files for security. We can fix this by explicitly allowing it.

A solution to (1) might be to write the HTML to the directory containing the notebook but using a unique name so as to not interfere with existing files. Then all references to files would work.

rlleshi commented 4 years ago

Yeah, you are right, the problem seems to be referencing local files. Thanks for the suggestions

psychemedia commented 3 years ago

Is a possible fix for this the --to html_embed handler provided by nbextensions/export_embedded

betatim commented 3 years ago

Nice find. We could add all images (and other things?) to the ipynb itself as base64 "data URIs". This means you'd get all the original images and such also in the notebook file which is attached to the PDF. This could be super useful when you later retrieve the file and have lost the context directory. A downside is that it can make the file very large.

For reference the code used by the export_embedded extension is https://github.com/ipython-contrib/jupyter_contrib_nbextensions/blob/b767c69a374f68d2a7272e4fe9e0a40a47cdb8f0/src/jupyter_contrib_nbextensions/nbconvert_support/embedhtml.py

psychemedia commented 3 years ago

"A downside is that it can make the file very large."

Could the extension have an nbextensions setting or toolbar toggle button that lets you tick a checkbox to select whether the notebook is attached to the exported PDF or not?

Cube707 commented 2 years ago

Would a RegEx search and replace on the notebook before it is passed to the HTML exporter be a valid option?

I have experimented with it and its seems to work fine (on windows at least). See examplecode below. It could also be expanded to work on both the Markdown syntax (![alt](file)) and with <img> HTML tags.

I don't know if the performancepanalty would be to big or if I am missing another drawback. Whats your opinion?

code:

import re
import os

# create demo notebook:
notebook = {}
cell = {
    'cell_type': 'markdown',
    'source': r"""
        This image is a image inserted via Markdown image tag:

        ![A poppy field](https://unsplash.com/photos/sWlxCweDzzs/download?force=true&w=640 "A poppy field")

        ![a local file](my file.png)
        ![a file with Caption](file.png "test")
        ![Windows absolout](C:\file.png "test")
        ![Linux root](/file.png "test")
        ![Linux home](~/file.png "test")
    """
}
notebook['cells'] = [cell]

# real code:
RE_local_Images = re.compile(r"!\[(.*)\]\((?!https?://|[A-Z]:\\|/|~/)(.*?)( (\"|').*(\"|'))?\)")

for cell in notebook['cells']:
    if not cell['cell_type'] == 'markdown':
        continue

    offset = 0
    for match in RE_local_Images.finditer(cell['source']):
        path = match.group(2)
        fullpath = (os.path.realpath(os.path.join(resources['metadata']['path'], path))).replace(' ', '%20')
        cell['source'] = cell['source'][:match.start(2)+offset] + fullpath + cell['source'][match.end(2)+offset:]
        offset += len(fullpath)-(match.end(2)-match.start(2))

print(cell['source'])

The RegEx ignores all non-local types of path (I could think of) and replaces the files with full a path. Spaces in the Path are URL-escaped, because Markdown doesn't like that. Subfolders and ../ are also posible.

I tested inserting the code into the Project here:

https://github.com/betatim/notebook-as-pdf/blob/95f7e924dc898102c28973ff3c803ccf0a233dfd/notebook_as_pdf/__init__.py#L220-L221

and it seems to work great.

Cube707 commented 2 years ago

Im am tagging you @betatim, because I am not sure if you got a notification about the above comment.

gsteele13 commented 2 years ago

Is there any progress on this? I find this a very useful package for reviewing / editing / commenting on notebooks on my ipad, but broken images is a bit of a deal breaker.

mortbauer commented 3 months ago

This is really easy to fix with passing embed_images=True, nevertheless I created a pull request: https://github.com/betatim/notebook-as-pdf/pull/44.