codedread / bitjs

Binary Tools for JavaScript
MIT License
84 stars 7 forks source link

Get archive content metadata #21

Open kefniark opened 4 years ago

kefniark commented 4 years ago

Description

Trying your RAR decoder as a replacement of libarchive, it works well but I'm running into some performance issue.

Somes books I'm dealing with (in CBR) are quite big, over 500MB and 300 pages. But I dont see with this library how get file descriptions about the rar content without extracting everything. I end up loading the whole book in memory (which takes >20s) when I just want to list files and load the 2 first pages.

LibArchive for example expose a method .getFilesObject() to access metadata and listing file. And the reading/decoding operation is a separated async operation.

I searched but I couldnt see a way to have this kind of feature with bitjs, am I missing something?

codedread commented 4 years ago

1) Out of curiosity, what kind of files are they?

2) Are you sure that the files have been added to the archive in the order in which they need to be extracted? I have encountered RAR and ZIP files where this is not the case, which means the whole archive needs to be extracted to put files in the right order.

I'm afraid I haven't dug into the RAR format too much lately, so I've forgotten if what you're asking is possible (like is the metadata at the beginning of the file? the end of the file? interspersed throughout the file?). But it seems possible we could introduce a mode that lets the client code extract a file at a time. I want something like this for my comic book reader anyway: https://github.com/codedread/kthoom/issues/31

Out of curiosity, why move from libarchive to this library? Since this is bespoke JavaScript library that does unarchiving of files, I would guess that any library that compiles from a C library like unrar to Web Assembly would have more complete support for the format.

kefniark commented 4 years ago
  1. Nothing special, just big CBR books, full of illustrations
  2. No indeed they are not sorted and that's the problem, from my list of sample books, a certain % of them are not sorted. But because I access those file locally, I can access any part without streaming

Out of curiosity, why move from libarchive to this library

libarchive works really well for cbz and cbt, but their unrar library is quite buggy and not updated:

from my test, I have errors with lot of books in CBR format, which is why I was trying other unrar library to find a workaround.

For the moment my best workaround is to use libarchive to get rar file description, and fallback to node-unrar-js to access rar content. It works well and solve my problem, but it's a quite ugly hack 😄

I'm quite surprised there is no good unrar library in JS updated without crazy dependencies

codedread commented 4 years ago

bitjs is an attempt to avoid any dependencies for this functionality, but its support is not complete (but good enough, imo).

TBH, I'm not that surprised, since RAR Is an undocumented, proprietary format, so unarchivers have to either compile the unrar source or reverse-engineer that.

I'll think about this some more - I can't promise anything though!