Closed Gozala closed 2 years ago
Hmm... So it seems that current implementation of freeze-dry actually carries on crawling / inlining resources it's just it will also start serializing DOM so I would guess on timeout you will wind up with some resources inlined and others stay intact. There are also few TODOs to abort tasks on timeout and to do something about resources that have not being processed.
I think it might be worth discussing what would be a good solution here, I don't think replicating current behavior would be ideal, although it is possible it would introduce some messiness to otherwise (IMO) well structured code.
(Mostly noting this so I don't forget) I think I have an idea for a way of dealing with timeouts. We could use AbortController
that can be passed to each fetch
that will be issued by dreezeDry
and used to abort all of them as necessary.
@Treora I'm having trouble upgrading jest-fetch-mock
to latest version that should have support for AbortSignal
, it seems that everything breaks after upgrade.
I did notice that after update await (await fetch(docUrl)).text()
is [object Object]
instead of markup, which I suspect is the reason. I also noticed that fetch.Response
and Response
used in mockFetch
aren't equal and that makes me think it's related.
I'm also not sure where Response
and Blob
globals are coming from in first place, would you mind helping with that ?
I have figured the source of the issue. It's related to https://github.com/bitinn/node-fetch/issues/392#issuecomment-483333843:
New jest-mock-fetch
uses cross-fetch
that in turn uses node-fetch
that has it's own Blob
implementation that is incompatible with one that jsdom expects, which basically ends up calling blob.toString()
and that's where [object Object]
comes from.
Unfortunately I can't see any way to resolve this without mass-forking the whole dependency chain in attempt to resolve this.
jsdom was unfortunately unwilling to use the new Public blob API to read the content of the blob with promises.
Maybe you have to use jsdom's own fetch implementation instead of node-fetch (to get there internal blob)?
the alternativ is if you can: avoid creating a res.blob()
and get it as arrayBuffer using res.arrayBuffer()
directly instead of calling something like fr.readAsArrayBuffer(blob)
the other way maybe could be to get it as a arraybuffer and then construct a blob using jsdom if you really need a blob
res.arrayBuffer().then(ab => {
new Blob([ab])
})
Or try to convince him to use the FileReader to allow reading jsdom like object with the public api - i have already tried and done my part
I end up resorting to a horrible hack to get jest-mock-fetch > cross-fetch > node-fetch
interop with jsdom
. I wish there was a better way but there isn't suggestion above assumes we control the full stack, but we don't.
I'm also not sure where
Response
andBlob
globals are coming from in first place, would you mind helping with that ?
As you probably found out already, these are set by jest-fetch-mock
.
I have not fully grasped the issue you are dealing with (and would be very glad if I do not need to); but if you found some changes are required in the test setup etcetera, of course feel free to propose. I'm quite indifferent about what test frameworks and such we use; as long as not too much effort is wasted maintaining them. I hope the hacks can mostly be abstracted away from sight, and added in such a way that we can easily remove them if the problematic libraries will have solved issues underneath..
We could use
AbortController
that can be passed to eachfetch
that will be issued bydreezeDry
and used to abort all of them as necessary.
Sounds good. Also, something signal-based approach like AbortController seems a good API we could offer instead of the timeout
option. (a user could just use setTimeout
themself to still get the timeout
behaviour)
I have not fully grasped the issue you are dealing with (and would be very glad if I do not need to); but if you found some changes are required in the test setup etcetera, of course feel free to propose. I'm quite indifferent about what test frameworks and such we use; as long as not too much effort is wasted maintaining them. I hope the hacks can mostly be abstracted away from sight, and added in such a way that we can easily remove them if the problematic libraries will have solved issues underneath..
Problem is version of jest-fetch-mock
did not had support of AbortController
. New version does, but it also switches underlying implementation with cross-fetch
which uses node-fetch
in node. After updating jest-fetch-mock
all tests were breaking, because global.Request
/ global.Response
are from node-fetch
while global.Blob
is from jsdom
worse yet Response
constructor does not understand jsdom.Blob
and just serializes it to [object String]
, furthermore Response(...).blob
is a Blob
instance from node-fetch
that JSDOM does not understand.
This means even if we managed to make this.Blob
be the one from node-fetch
(which we can't because maintainers of this library don't want to expose it https://github.com/bitinn/node-fetch/issues/392), we would still have an issue with passing Response
's back to JSDOM world because when it attempts to use response.blob()
will not be JSDOM.Blob
but would be NodeFetch.Blob
instead.
Ideally differences here would be resolved across jsdom
and node-fetch
by eliminating competing Blob
implementations and just create a common one that both would use (https://github.com/jsdom/jsdom/issues/2555#issuecomment-483384509). Until that happens I end up making a hack that changes JSDOM.Blob.prototype.__proto__ = NodeFetch.Blob.prototype
adds methods from NodeFetch.Blob.prototype
to call corresponding methods on JSDOM.Blob.prototype
. Which is horrible, but only way to make blob instanceof Blob
work regardless in Blob
is from jsdom
or node-fetch
.
Sadly they monkeypatching can't "be out of sight" that is because NodeFetch.Blob
isn't really available and can only be obtained async ((await Response(...).blob().constructor
)
@Treora I discovered one peculiar behavior as I was trying to change subresources
from being an Iterable<Resource>
to a Map<string, Resource>
- that is current implementation does not recognize duplicates: link to the same image from both document and styles will end up fetching it twice.
I would like to address this with an alternate design, where resource:link
mapping isn't 1 : 1
, but one 1:n
. This pull is huge already & it might make more sense to do change to address this as followup instead.
current implementation does not recognize duplicates: link to the same image from both document and styles will end up fetching it twice.
Yes, thus far it seemed best to keep things simple and leave deduplication to the fetch function: the user-supplied fetchResource
callback, or even the browser's cache. This could be changed, but would need to be done carefully as a URL does not necessarily always return the same resource: request headers, e.g. the Accept
'ed content types, could matter.
This pull is huge already & it might make more sense to do change to address this as followup instead.
Indeed; this already PR is already so large that I suppose the most workable plan will be, when you feel you got to a somewhat stable solution you feel confident about, to try collaboratively review and incorporate various aspects of it in a few iterations. It is very helpful when any orthogonal changes are put into separate commits, or possibly in separate branches.
Problem is version of
jest-fetch-mock
did not had support ofAbortController
. ......
Thanks for the clear explanation, that helps!
Indeed; this already PR is already so large that I suppose the most workable plan will be, when you feel you got to a somewhat stable solution you feel confident about, to try collaboratively review and incorporate various aspects of it in a few iterations. It is very helpful when any orthogonal changes are put into separate commits, or possibly in separate branches.
I already feel like it's done & I can't really do any meaningful progress without a review. So I suggest we do that.
Hi @Treora,
Any chance we could figure out a way to get some of this work into the tree ? It looks like I'll be needing this in my future work and I would very much prefer not to go on with my own fork.
I am sorry I ended up doing exactly what I intended to avoid: post-poning this work for several months. However, getting back to work on freeze-dry has been slowly rising up on my to-do list. Besides getting this bundling customisability incorporated I figured it may be worth calling it v1.0 and making some better documentation and publicity for it (just a web-page, no road-side billboards). Your reminder may just get me started on this, thanks for poking.
As I think I said before, I will probably not merge this PR as-is, but rather take the architecture and API design you demonstrated here as a guide and make changes in a few steps; while trying to keep the code a bit more modular than this PR.
Also, I am still eager to make the code typed (see issue #28), but might do that afterwards.
As I think I said before, I will probably not merge this PR as-is, but rather take the architecture and API design you demonstrated here as a guide and make changes in a few steps; while trying to keep the code a bit more modular than this PR.
Is there way we could collaborate ? I do want to use freeze-dry for the work & that would require things like changes in this pull request & more. However I am not sure at this point what would be a best strategy to go about it without diverging further.
Also, I am still eager to make the code typed (see issue #28), but might do that afterwards.
For what it's worth I would advice going about it other way round. Type system pose API constraints that otherwise could be overlooked. Additionally type checker is very helpful in large refactoring as it will identify places you've missed.
Assuming you're in favor of public API & risk of sounding biased, I would suggest you to consider landing this without rewriting and then followup witch changing internals to match desired modularity. Benefits of this approach would be:
Is there way we could collaborate ? I do want to use freeze-dry for the work & that would require things like changes in this pull request & more. However I am not sure at this point what would be a best strategy to go about it without diverging further.
I do not know either what may be the best strategy. I suppose it helps if we exchange more details about our plans; and if we try to explain changes we make/need so they could be redone (‘rebased’, conceptually) if needed.
I am curious, what kind of changes do you expect the “& more” will contain?
Assuming you're in favor of public API & risk of sounding biased, I would suggest you to consider landing this without rewriting and then followup witch changing internals to match desired modularity.
I am considering this indeed, but need to look into the code again and fully understand it. As I intend to do a substantial rewrite, and I am already having trouble wrapping my head around some aspects, I suspect that for me it would be much easier and thus quicker to rewrite code I am most familiar with.
As for the API, I am still thinking about making ‘two levels’ of what you have sort of done in one; I’d like to make the code more ‘graph-oriented’ (perhaps passing subgraphs of subresources&links around), and provide the ‘callback-oriented’ API you created only as a convenience function on top of that. Not sure that makes sense to you, nor if it will keep making sense to me as I dig deeper. :)
As for the API, I am still thinking about making ‘two levels’ of what you have sort of done in one
I may be inaccurate here, as you do leave the possibility to interact with the resource graph without using the resolveURL
callback (as you exemplify above). I will look into this again and see how much that already matches what I had in mind.
Also, I am still eager to make the code typed (see issue #28), but might do that afterwards.
For what it's worth I would advice going about it other way round. Type system pose API constraints that otherwise could be overlooked. Additionally type checker is very helpful in large refactoring as it will identify places you've missed.
Good points. I was thinking that it may slow down rewriting, especially as I am not fluent with typed javascript. Could work out either way, I guess.
As for which type system/language to use; I think you once said you use flow for its more powerful templating system, but that typescript may be helpful because of its wider uptake, community, etc. How is your opinion on this now? I have been a little biased towards adopting typescript, but if you recommend flow I am fine with following your lead on this.
As for which type system/language to use; I think you once said you use flow for its more powerful templating system, but that typescript may be helpful because of its wider uptake, community, etc. How is your opinion on this now? I have been a little biased towards adopting typescript, but if you recommend flow I am fine with following your lead on this.
I still think flow has a better & more flexible type system, however typescript is probably a better choice given it’s popularity.
Only argument I would consider is that with typescript you’ll need to do builds etc... In that regard I think flow has an advantage as you can add types as comments (like in this pull) and code would load natively in runtimes that support ES modules.
As for the API, I am still thinking about making ‘two levels’ of what you have sort of done in one I may be inaccurate here, as you do leave the possibility to interact with the resource graph without using the resolveURL callback (as you exemplify above). I will look into this again and see how much that already matches what I had in mind.
Indeed you can traverse it as a graph, and resolveURL
is just a convince kind of like array.reduce
but you don’t have to use it.
That being said I find it fairly convenient way to avoid multiple passes over the graph.
I think what you’re disliking about the API is side effect of it being designed to traverse and bundle graph in a single pass, which in turn requires pipelining fetch, parse, remap, save operations such that as you descend it does all of them.
I don’t know if there’s a better way to go about it, but my experience so far had being it allows me to customize behavior very simply
Alternative API could follow visitor pattern which accomplishes more or less same goals, but is more OOP style.
Current solution allows you to either use pushed base interface via resolveURL
hook which is your visitor pattern I suppose but it also allows you to use pull based interface as shown in the example in the description.
I do not know either what may be the best strategy. I suppose it helps if we exchange more details about our plans; and if we try to explain changes we make/need so they could be redone (‘rebased’, conceptually) if needed.
I am curious, what kind of changes do you expect the “& more” will contain?
I do not exactly know yet, but here is a bit of overview of the context I'm using freeze-dry right now https://github.com/gozala/artifacts
Readme does not quite talks about it yet, but streamingable web bundle formats are also on todo, meaning I'll need to start stream bundle as document is being parsed and bundled & ideally such that parallel requests to sub-resources could be streamed in parallel (not forcing them to queue up).
Another thing that I intend to change about current architecture is the source DOM tree mutation, that currently occurs during freeze-drying.
Finally I want to also add a way to plug things like CSS parsers which currently are baked in.
Hope this gives enough context of what's on my list.
Happy anniversary to this PR. :birthday: (just kidding, not proud of this..)
As discussed, I started with converting the project to typescript; took a while, but it’s pretty much done, see #52! (any comments there are welcome, it’s my first typescript project)
Next up is getting this functionality in, using this PR as reference but obviously writing everything anew.
Proposed changes introduce concept of
Resource
which on instantiation is just a pointer to the corresponding sub-resourcelink
(wherelink
is what it used to be). Each resource is a lazily freeze-dried on demand:resource.dowload():Promise<Response>
method that will useoptions.fetchResource
for IO and will cache response so next call will return same promise.resource.resources():Promise<Iterable<Resource>>
method that will internally usethis.download()
to get original resource and extract links wrapping them into newResource
instances (note that subresources returned will not have fetched anything yet).resource.text()
/resource.blob()
methods that would do all of the above and replace links with URLs returned byoptions.resolveURL(resource):Promise<string>
. Sinceoptions.resolveURL
is async it can recurse and inlineblobToDataURL(await resource.blob())
or return some URL without having to fetch actual resource.Primary API has also changed from
freezeDry(opts):Promise<string>
tofreezeDry(opts):Resource
there for to accomplish what old API did user will need toawait freezeDry(opts).text()
instead. This change also means user could doawait freezeDry(opts).blob()
or in more advanced use case choose to hand roll alternative serialization: