WebMemex / freeze-dry

Snapshots a web page to get it as a static, self-contained HTML document.
https://freezedry.webmemex.org
The Unlicense
270 stars 18 forks source link

Allow customisation, use playwright for tests, write documentation #59

Closed Treora closed 2 years ago

Treora commented 2 years ago

Well, that took a while, finally got around to finishing this large reorganisation that has been over three years in the making, sporadically progressing at occasional moments of inspiration, since @Gozala proposed something similar in PR #44. Thanks again, and sorry for the wait!

Customisation

The approach taken here is heavily inspired by #44, and also introduces a Resource class with subclasses DomResource, StylesheetResource, and LeafResource, each exposing the resource’s links, its content as a blob, etc. However in my approach the resources are not lazy, and are slightly less coupled to the freezeDrying process.

It provides multiple ways of customising freeze-dry’s treatment of subresources:

Tests

Jest (i.e. JSDOM) was too limiting for some tests, e.g. iframes with srcdoc attribute are not implemented there. Some new tests are added using playwright, which uses a headless browser instead. Old tests are not converted, so we still have Jest too.

Documentation

The final effort was fully documenting the API, using Typedoc to generate a website. And the Readme’s are updated and expanded and published along with it; all on https://freezedry.webmemex.org.

Related issues.

Fixes #8. (Provide the option to get resources separately) Fixes #9. (Allow getting the result before completion) Fixes #42. (Allow alternative blob seralizations) Closes #44. (Lazy crawler+bundler implementation) Fixes #25 (Handle iframes with srcdoc) (not exactly related, but somewhere in the process I fixed this too) Also includes Vite for bundling into a single JS file; distributing such bundles (and documenting it) is yet to be done.