betatim / notebook-as-pdf

Save Jupyter Notebooks as PDF
BSD 3-Clause "New" or "Revised" License
368 stars 72 forks source link

Todo for v0.3.0 #12

Closed betatim closed 4 years ago

betatim commented 4 years ago

Things to do for v0.3.0

betatim commented 4 years ago

For the stretch goal: I'd like to have a outline/table of contents like in this picture:

gonsU

To make it happen we probably need PdfFileWrite.addBookmark() and a way to figure out where on the page the <h1> tag is so we can jump to it. For now I'd just list <h1> tags and deal with nesting later.

adavidzh commented 4 years ago

For the stretch goal: I'd like to have a outline/table of contents like in this picture: [...] To make it happen we probably need PdfFileWrite.addBookmark() and a way to figure out where on the page the <h1> tag is so we can jump to it. For now I'd just list <h1> tags and deal with nesting later.

I agree that pages are arbitrary and bookmarks would be great.

FWIW, https://site/file.pdf#page=N is a staple for me when sharing information and I wonder if one could make use of a more granular way of linking to PDF document.

betatim commented 4 years ago
function getOffset( el ) {
    var _x = 0;
    var _y = 0;
    while( el && !isNaN( el.offsetLeft ) && !isNaN( el.offsetTop ) ) {
        _x += el.offsetLeft - el.scrollLeft;
        _y += el.offsetTop - el.scrollTop;
        el = el.offsetParent;
    }
    return { top: _y, left: _x };
}

will compute the distance from the top (and left) of an element on a web page. Which we can then use with:

for (const elem of document.getElementsByTagName("h1")) {
    console.log(elem, getOffset(elem).top, elem.innerText)
}

to get the positions of all the H1s on the page. Once we have this information we need to return the position and text from chromium to Python and then call addBookmark().

betatim commented 4 years ago

This is the Python we need to do this:

    await page.evaluate("""
    function getOffset( el ) {
        var _x = 0;
        var _y = 0;
        while( el && !isNaN( el.offsetLeft ) && !isNaN( el.offsetTop ) ) {
            _x += el.offsetLeft - el.scrollLeft;
            _y += el.offsetTop - el.scrollTop;
            el = el.offsetParent;
        }
        return { top: _y, left: _x };
        }
    """, force_expr=True)

    h1s = await page.evaluate(
        """() => {
        var vals = []
        for (const elem of document.getElementsByTagName("h1")) {
            console.log(elem, getOffset(elem).top, elem.innerText)
            vals.push({ top: getOffset(elem).top, text: elem.innerText })
        }
        return vals
    }"""
    )

then h1s will contain the text and "distance from the top of the page" for each h1 tag.