djplaner / Content-Interface-Tweak

Improves both the task of creating content for Blackboard Learn, and reading that content.
https://djplaner.github.io/Content-Interface-Tweak/
GNU General Public License v3.0
0 stars 0 forks source link

produce a single PDF of course site #54

Open djplaner opened 3 years ago

djplaner commented 3 years ago

Produce method to generate a single PDF of a given set of Content Interface pages

Early design

operation

In a single O365 shared folder

The python script (or eventually other)

Process

To do

djplaner commented 3 years ago

Technology options

Reading content

Question is whether we can add additional bookmarks based on headings. To do this we need to be able to read content on a page and determine if we should add a bookmark for that page

That last one has a function that parses a PDF document and generates a JSON data structure breaking the PDF up and identifying headings. Including the ones I'm after.

PyMuPDF also has functionality that generates ToC.

djplaner commented 3 years ago

Different pages printing with different CSS.

Problem may be that CHrome has two separate ways to produce PDFs. Adobe and internal. NOPE

djplaner commented 3 years ago

Handling long headings

LHS34 study guide topic 5 has a long heading. The extract headings function is getting multiple headings throwing out the ToC

Probably because long headings are spread over lines/spans. Rather than just one. Currently eaching line/space is creating a heading.

djplaner commented 3 years ago

Still issues with variable font sizes

LHS34 assessment PDF is generating different font sizes which means that extract headings is having issues

Either

Identifying font sizes

Different PDF files have different font sizes. But extractHeadings works on a completed document. Luckily the font sizes are uniquely strange - lots of decimal points.

This needs to be called on each individual file to make it easier to distinguish. Could even include the content for the chapter

headingFontSizes = extractChapterHeadingFontSizes( doc )

djplaner commented 3 years ago

Remove review status on messages

PDFs are generated with edit mode on. This adds some standard messages re: hidden, review status etc. Remove these.