Add the ability to download the content (print documents)

djplaner commented 4 years ago

Explore if Javascript librararies might provide a way to produce offline versions of Content Interface.

And/or Python libraries that might be combined with screen scraping.

Rationale

The Word documents can contain styling and embedded documents that don't print well. They are applied on the web. If we're able to use those to generate PDF/DOC the styling will be applied

Current status

Weasyprint with Python script can produce a PDF that's close, but

Needs Bb content interaface to be set to expandAll
Doesn't handle some of the CSS alignment well (e.g. exerises, pictureRight)
Doesn't handle YouTube videos

djplaner commented 4 years ago

Comparison of XHTML2PDF, WeasyPrint and UnoConv

Possible options

WeasyPrint

Python - source code Documentation

Seems to be able to be provided a HTML string.

UnoConv

Is based on using OpenOffice. COuld be interetsing, but heavyweight.

djplaner commented 4 years ago

WeasyPrint

Install locally - Success

pip install Weasyprint
Try to print edu8702 week1

weasyprint URL PDFFile Failure on Windows with default install
Try it on Linux success
Try it with some content from ContentInterface

success somewhat. Still need to run the Javascript on it.
Try it with Javascript enabled

FAILURE The javascript is not handled by Weasyprint - at least default

Handling javascript

Suggestion is to use a Javascript pre-processor first.

Apparently PhantomJS is an option. But it isn't being maintained. The alternative is headless chrome or similar. i.e. something that works on the completed DOM after Javascript has run. e.g. Python + Selenium

Basically working. However, by default accordions can create some issues. Remove them and all ok.

What about expandAll? Could get selenium to click the expandAll button

How to show iframes/YouTube videos

These don't seem to working too well.

djplaner / Content-Interface-Tweak