Closed betatim closed 4 years ago
For the stretch goal: I'd like to have a outline/table of contents like in this picture:
To make it happen we probably need PdfFileWrite.addBookmark()
and a way to figure out where on the page the <h1>
tag is so we can jump to it. For now I'd just list <h1>
tags and deal with nesting later.
For the stretch goal: I'd like to have a outline/table of contents like in this picture: [...] To make it happen we probably need
PdfFileWrite.addBookmark()
and a way to figure out where on the page the<h1>
tag is so we can jump to it. For now I'd just list<h1>
tags and deal with nesting later.
I agree that pages are arbitrary and bookmarks would be great.
FWIW, https://site/file.pdf#page=N
is a staple for me when sharing information and I wonder if one could make use of a more granular way of linking to PDF document.
function getOffset( el ) {
var _x = 0;
var _y = 0;
while( el && !isNaN( el.offsetLeft ) && !isNaN( el.offsetTop ) ) {
_x += el.offsetLeft - el.scrollLeft;
_y += el.offsetTop - el.scrollTop;
el = el.offsetParent;
}
return { top: _y, left: _x };
}
will compute the distance from the top (and left) of an element on a web page. Which we can then use with:
for (const elem of document.getElementsByTagName("h1")) {
console.log(elem, getOffset(elem).top, elem.innerText)
}
to get the positions of all the H1s on the page. Once we have this information we need to return the position and text from chromium to Python and then call addBookmark()
.
This is the Python we need to do this:
await page.evaluate("""
function getOffset( el ) {
var _x = 0;
var _y = 0;
while( el && !isNaN( el.offsetLeft ) && !isNaN( el.offsetTop ) ) {
_x += el.offsetLeft - el.scrollLeft;
_y += el.offsetTop - el.scrollTop;
el = el.offsetParent;
}
return { top: _y, left: _x };
}
""", force_expr=True)
h1s = await page.evaluate(
"""() => {
var vals = []
for (const elem of document.getElementsByTagName("h1")) {
console.log(elem, getOffset(elem).top, elem.innerText)
vals.push({ top: getOffset(elem).top, text: elem.innerText })
}
return vals
}"""
)
then h1s
will contain the text and "distance from the top of the page" for each h1
tag.
Things to do for v0.3.0
--no-sandbox
the default #9setup.py
to be 0.3.0 #9<h1>
tags in the notebook