Open lowlighter opened 1 month ago
The problem is that you're not handling the entry.readable
within the for loop. If .readable
exists on entry then you need to either consume it entirely or .cancel()
it before the next entry in the for loop will resolve.
Might be useful to give TarStreamEntry
a Symbol.dispose
method so you can do this (pending a fix for this Deno bug):
for await (using entry of entries) {
console.log(entry.path, entry.header.size)
}
Might be useful to give
TarStreamEntry
aSymbol.dispose
method so you can do this (pending a fix for this Deno bug):for await (using entry of entries) { console.log(entry.path, entry.header.size) }
Maybe. That user error bug would then be somebody trying to consume the readable stream, not awaiting its consumption in the scope and getting thrown a bad resource ID.
Is blocking the next entry until previous readable is either discarded or consumed a limitation or is it by design ?
If the latter I feel like maybe it is slightly inconvenient.
Beyond the fact that the API is changed so it's less of a drop-in solution, it prevents some parallelization patterns like Array.fromAsync()
.
In any case it's easily missable from the docs, especially if you're the one reviewing code from someone else, you're probably going to miss this specificity
Is blocking the next entry until previous readable is either discarded or consumed a limitation or is it by design ?
It's a limitation of tar. Each file in tar is appended one after the other. Tar isn't designed to be extracted in parallel, but one after the other. A parallel version could be made specifically only for Deno but that would require the file to be saved to disk and not compressed, so it could then seek around the file. But I don't know many people who would be writing non-compressed tar files to disk.
Streams are also designed to work linearly, meaning one must work with the data from left to right. Streams are great for working linearly over large amounts of data while keeping a small memory footprint.
Might be useful to give
TarStreamEntry
aSymbol.dispose
method so you can do this (pending a fix for this Deno bug):for await (using entry of entries) { console.log(entry.path, entry.header.size) }
Going back to this idea on implementing Symbol.asyncDispose
for TarStreamEntry
.
We could implement it to solve the above issue of hanging indefinitely when entry?.readable
isn't handled within the scope.
Some considerations to take into account though is if the user did this, forgetting to await the pipping, the code would then throw a bad resource ID as the scope ends and the readable tries to continue. This may be more desirable behaviour than hanging indefinitely.
for await (using entry of readable) {
console.log(entry.path, entry.header.size)
entry?.readable.pipeTo((await Deno.create(entry.path)).writable)
}
Another consideration to take into account though is if the user is properly handling the entry?.readable
, then the using
keyword is completely reductant and using const
would end up saving a few checks.
for await (const entry of readable) {
console.log(entry.path, entry.header.size)
if (condition) await entry?.readable.pipeTo((await Deno.create(path)).writable)
else await entry?.readable.cancel()
}
When using const
in this second snippet, forgetting the await
keyword is perfectly acceptable.
Oh yeah, I guess it should be Symbol.asyncDispose
as readable.cancel
is async.
With Symbol.asyncDispose
, the correct syntax is for await (await using entry of readable)
rather than for await (using entry of readable)
... however, in practice I don't think it will make a difference, as the loop iteration is only suspended until cancel
resolves. Not sure if that could cause any footguns.
Edit: nvm, using
won't call Symbol.asyncDispose
, only Symbol.dispose
. So for await (await using
is necessary unless overloading Symbol.dispose
to return a promise (which seems like a bad idea).
however, in practice I don't think it will make a difference, as the loop iteration is only suspended until cancel resolves. Not sure if that could cause any footguns.
Thinking about it, in the example where forgetting to await the pipping, I think it may be more likely that the stream becomes corrupt as some of it is deleted through .cancel()
and other parts is pipped, resulting in the pipped data missing chunks.
Describe the bug
Given the following valid archive:
Steps to Reproduce
Using new std/tar result in a code stuck (or if it isn't it's way longer than minutes):
Expected behavior
Using old std/archive/untar works without issues:
References