Well, that took a while, finally got around to finishing this large reorganisation that has been over three years in the making, sporadically progressing at occasional moments of inspiration, since @Gozala proposed something similar in PR #44. Thanks again, and sorry for the wait!
Customisation
The approach taken here is heavily inspired by #44, and also introduces a Resource class with subclasses DomResource, StylesheetResource, and LeafResource, each exposing the resource’s links, its content as a blob, etc. However in my approach the resources are not lazy, and are slightly less coupled to the freezeDrying process.
It provides multiple ways of customising freeze-dry’s treatment of subresources:
The option newUrlForResource (a callback) allows changing merely the resource→dataUrl transformation; all recursion etc. happens as usual. Also the dryResource callback can be given to change transformations.
For more thorough customisation, the processSubresource callback can be overridden as a whole; this gives the power that the resolveURL callback had that was proposed in #44. However, my implementation does not take a “lazy resource” but a Link object as its parameter. The callback method will have to run Resource.fromLink(link) to fetch the resource and create a Resource object. It’s an extra step, but makes the steps more explicit; we only have a Resource if we have the data, makes intuitive sense. Likewise, recursing into subresources is an explicit step, but again made a one-liner by a method resource.processSubresources(callback) (though I suppose resource.subresourceLinks.forEach(callback) is a one-liner too).
For example, this is a simplification of what the default freezeDry implementation does (assuming other options are kept at their defaults too): (from the docs)
async processSubresource(link, recurse) {
link.resource ||= await Resource.fromLink(link) // fetch the subresource
await link.resource.processSubresources(recurse) // recurse into its links
await link.resource.dry() // dry the subresource
link.target = blobToDataUrl(link.resource.blob) // inline its content in the link
}
The freezeDry function has kept its signature, but if one wants the end result before it is finished, or as a blob rather than string, there is the FreezeDryer class that gives more control. The freezeDry function is merely a convenience wrapper, doing roughly this:
const freezeDryer = new FreezeDryer(document, options)
await freezeDryier.run()
const html = freezeDryer.result.string
One can also ignore the freezeDry function altogether, and use the Resource classes directly.
Tests
Jest (i.e. JSDOM) was too limiting for some tests, e.g. iframes with srcdoc attribute are not implemented there. Some new tests are added using playwright, which uses a headless browser instead. Old tests are not converted, so we still have Jest too.
Documentation
The final effort was fully documenting the API, using Typedoc to generate a website. And the Readme’s are updated and expanded and published along with it; all on https://freezedry.webmemex.org.
Related issues.
Fixes #8. (Provide the option to get resources separately)
Fixes #9. (Allow getting the result before completion)
Fixes #42. (Allow alternative blob seralizations)
Closes #44. (Lazy crawler+bundler implementation)
Fixes #25 (Handle iframes with srcdoc) (not exactly related, but somewhere in the process I fixed this too)
Also includes Vite for bundling into a single JS file; distributing such bundles (and documenting it) is yet to be done.
Well, that took a while, finally got around to finishing this large reorganisation that has been over three years in the making, sporadically progressing at occasional moments of inspiration, since @Gozala proposed something similar in PR #44. Thanks again, and sorry for the wait!
Customisation
The approach taken here is heavily inspired by #44, and also introduces a Resource class with subclasses DomResource, StylesheetResource, and LeafResource, each exposing the resource’s links, its content as a blob, etc. However in my approach the resources are not lazy, and are slightly less coupled to the freezeDrying process.
It provides multiple ways of customising freeze-dry’s treatment of subresources:
The option
newUrlForResource
(a callback) allows changing merely the resource→dataUrl transformation; all recursion etc. happens as usual. Also thedryResource
callback can be given to change transformations.For more thorough customisation, the
processSubresource
callback can be overridden as a whole; this gives the power that theresolveURL
callback had that was proposed in #44. However, my implementation does not take a “lazy resource” but aLink
object as its parameter. The callback method will have to runResource.fromLink(link)
to fetch the resource and create a Resource object. It’s an extra step, but makes the steps more explicit; we only have a Resource if we have the data, makes intuitive sense. Likewise, recursing into subresources is an explicit step, but again made a one-liner by a methodresource.processSubresources(callback)
(though I supposeresource.subresourceLinks.forEach(callback)
is a one-liner too). For example, this is a simplification of what the default freezeDry implementation does (assuming other options are kept at their defaults too): (from the docs)The
freezeDry
function has kept its signature, but if one wants the end result before it is finished, or as a blob rather than string, there is theFreezeDryer
class that gives more control. The freezeDry function is merely a convenience wrapper, doing roughly this:One can also ignore the freezeDry function altogether, and use the Resource classes directly.
Tests
Jest (i.e. JSDOM) was too limiting for some tests, e.g. iframes with
srcdoc
attribute are not implemented there. Some new tests are added using playwright, which uses a headless browser instead. Old tests are not converted, so we still have Jest too.Documentation
The final effort was fully documenting the API, using Typedoc to generate a website. And the Readme’s are updated and expanded and published along with it; all on https://freezedry.webmemex.org.
Related issues.
Fixes #8. (Provide the option to get resources separately) Fixes #9. (Allow getting the result before completion) Fixes #42. (Allow alternative blob seralizations) Closes #44. (Lazy crawler+bundler implementation) Fixes #25 (Handle iframes with srcdoc) (not exactly related, but somewhere in the process I fixed this too) Also includes Vite for bundling into a single JS file; distributing such bundles (and documenting it) is yet to be done.