Closed sntran closed 6 months ago
Development on new features are largely stalled at the moment, so you're welcome to have a go if you want. There's some stubs to get you started - essentially you need to create a stream to pass to the uploader (like for regular files), and update bits above to pass the size along (and probably other things I can't remember).
Otherwise, if you can find a streaming archive client, and can get the exact file size upfront, the procjson feature may be good enough.
Thanks for the first steps! One thing that occurs to me is that by adding the input files into an archive stream, we essentially lose the size, as the final size of the archive is unknown, which would not be accepted by the uploader. That is how the procjson
feature works, isn't it? How would we get around that?
How would we get around that?
You can't. yEnc requires knowing the total size upfront, so working around this isn't possible.
This means that you probably won't be able to use any compression, as you can't predict the final size before hand.
Regardless of how you do it, you'll need to know the size of the resulting archive upfront, which likely means you'll have to disable compression.
zip_size = num_of_files * (30 + 16 + 46) + 2 * total_length_of_filenames + total_size_of_files + 22
for a copy-only archive.
Sounds simple enough :) There are of course differences between archivers, but good thing is that we control which archiver to use, so hopefully it would be a straightforward process.
Just to be upfront, I hate archiving, but for such use case, I need a container for all the files, instead of posting many 5KB files. Therefore, 0-level compression is a great choice, for both zipping and for nzbget to handle faster later on.
I'll take a look to see whether it makes sense to add archiving to my tool (which streams remote files to nyuu
) or directly to nyuu
. Single-responsibility and such :)
Further manual testing shows that yazl
follows that formula without compression, while p7zip
returns a slightly bigger size with -m0=Copy
flag.
Are you open to add yazl
as another dependency? Or would you prefer handling the archiving directly?
From a quick glance, that looks like a nice library. Make sure to use the size it reports than trying to compute it yourself - even if you control the archiver, it can change with a different version, so best to use their value.
For inclusion in Nyuu, I generally follow a principle of minimal dependencies (to make installation easy), but don't mind it as an optional dependency. Would be nice if there was a way to use node-yencode's built-in CRC instead of their buffer-crc32 (computing CRC in Javascript doesn't exactly scream performant), but that's just a nice to have.
The 7z format has some benefits over ZIP for this use case, mostly standardised encoding of filenames (where ZIP doesn't) and the ability to compress metadata (mostly useful if there's a lot of files).
So overall, sounds good to me.
Make sure to use the size it reports than trying to compute it yourself.
Not sure I can do that. The input is from stream, and yazl
doesn't reports the archive size immediately, but we do need that size upfront, at least that's how I understand it working with procjson.
The 7z format has some benefits over ZIP for this use case, mostly standardised encoding of filenames (where ZIP doesn't) and the ability to compress metadata (mostly useful if there's a lot of files).
I would also prefer 7z, but the 7za
only takes one input stream from stdout
, which is not enough.
I haven't tried using the library, but the readme sounds like the size should be given upfront, as long as compression is disabled:
If
finalSize
is-1
, it means means the final size is too hard to guess before processing the input file data. This will happen if and only if thecompress
option istrue
on any call toaddFile()
,addReadStream()
, oraddBuffer()
, or ifaddReadStream()
is called and the optionalsize
option is not given. In other words, clients should know whether they're going to get a-1
or a real value by looking at how they are using this library.The call to
finalSizeCallback
might be delayed if yazl is still waiting forfs.Stats
for anaddFile()
entry. IfaddFile()
was never called,finalSizeCallback
will be called during the call toend()
. It is not required to start piping data fromoutputStream
beforefinalSizeCallback
is called.finalSizeCallback
will be called only once, and only if this is the first call toend()
.
If it doesn't work that way, perhaps file a bug report or ask the author about it.
Yeah, that finalSizeCallback
is only called when you call end()
, which needs to be done after adding the inputs.
For what we want to do, I believe we need to know the size before handling any inputs.
I have filed an issue on yazl
requesting the functionality.
Hi there,
Sorry for the delay. After a bit of research, I think it's not easy to add this feature. I'll close this issue for now. Thanks.
Hi again,
I see that this feature is planned, but am not sure where it is on your roadmap. I actually have a need for this to stream a folder with many files from remote source into a zip file before uploading.
I could create the archive before handing it off to
nyuu
, but since the files are in remotes, it would require double the disk space.I can also try to contribute to this feature, but I would like to hear about your vision on how it should be implemented.