advplyr / audiobookshelf

Self-hosted audiobook and podcast server
https://audiobookshelf.org
GNU General Public License v3.0
6.82k stars 481 forks source link

[Bug]: Big comics are basically unusable at the moment #3505

Open kuldan5853 opened 1 month ago

kuldan5853 commented 1 month ago

What happened?

Since the server was fixed to be better at importing cbr/cbz comic books (no crashes anymore, thanks!), I have been trying to implement a comic library.

The built-in reader works okay for smaller books (20, 30mb filesize), but as soon as files get over in the multiple hundred MB range, they either load infinitely (never opening), or if they open (I tested this on a 200mb file), it can take up to 30-60 seconds for a single page turn, while the whole system basically becomes unresponsive.

This makes using the comic book feature basically unusable at the moment.

This was the same across the App and the Web client, so I assume it is a backend issue.

What did you expect to happen?

Smoother performance / working at all also for larger cb files

Steps to reproduce the issue

  1. Add a big comic book (300+ mb) to your library and try to open it.

When it (eventually) opens, try to scroll through more than a few pages.

Audiobookshelf version

v2.14.0

How are you running audiobookshelf?

Windows Tray App

What OS is your Audiobookshelf server hosted from?

Windows

If the issue is being seen in the UI, what browsers are you seeing the problem on?

None

Logs

No response

Additional Notes

No response

mikiher commented 1 month ago

Up until now we had issues with scanning some of these files, and now that they mostly got fixed on the server, we're starting to see some client issues (the client and the server used to use similar buggy versions of un-archiving libraries, now the server is fixed, but the client might still have issue).

We're aware of (at least some of) those issues, and we'll try to focus on solving them soon.

advplyr commented 1 month ago

I think that performance issues are bound to occur on larger comics since we are extracting the cbr/cbz on the client side. Comic images tend to be a large file size. The reason it is useful to extract client side is we can make the comic reading available offline easier on mobile. The downside is this.

kuldan5853 commented 1 month ago

Thanks guys - I think as comic files can approach 500mb or more, this can become an issue - at least on the browser I can pretty easily put the server into an endless "please wait..." while simply flipping through pages..

Would it be possible to maybe make this a toggleable option to have server side rendering vs. client side rendering?

I assume some people would be happy with the tradeoff to invest cache ram / storage space on the server side if it means that the server only has to serve out single jpgs one at a time for performance reasons?

At any rate, comics are a side note to the whole server, so I can totally understand if it's not a priority. When the import issues got fixed I just got a bit too excited ;)

advplyr commented 1 month ago

I built that comic reader without looking into other projects that have a comic reader. I would be curious to know if other comic readers extract the images server side and reduce the file size.

I don't have any test comics that big so I haven't experienced the issue but can see how it would be slow. If it is getting stuck completely maybe you are coming across a bug and not just a performance issue.

mikiher commented 1 month ago

@advplyr we know for a fact that libarchive (which still runs on the client) has some issues with some cbr files, just like it had when scanning those on he server, before we replaced it. Those files would seem to be stuck forever while opening on the client (I saw this when testing with some of these files).

The first order of things is to replace libarchive on the client as well, and at least some of the issues will go away.

In terms of performance, my gut feeling says is that we should be able to deal with large files like these. After all, we are dealing with gygabyte-sized audio files, aren't we? Perhaps we're doing something very inefficient with libarchive (we know we were loading the files fully into memory, before we switched to the new un-archivers that use streaming - perhaps this is what's also causing performance issues on the client).

advplyr commented 1 month ago

Ah yeah I forgot that libarchive is loading the full archive into memory and not streaming. Replacing that would be the first step before looking at serving the images server side.

Looking at node-unrar-js it doesn't look like that library is able to stream either. Archiver library for CBZ files would be able to stream but I'm not sure if that is supported for the browser.

mikiher commented 1 month ago

node-unrar-js does use streaming (it has two modes), and we use the streaming mode.

The only peculiarity of node-unrar-js, is that when you use the streaming mode, it can only write the unzipped output to file, and then you have to read it back to memory if you need it. I don't think that this is a limitation of the underlying unrar logic, only of the API. perhaps when I have time, I will add unzipToBuffer functionality, but that's not a high priority.

advplyr commented 1 month ago

I was going off of the issue opened here https://github.com/YuJianrong/node-unrar.js/issues/164

kuldan5853 commented 1 month ago

If I can help with anything (e.g. sharing some of the big comics I used to produce the issue) I can gladly share them.

advplyr commented 1 month ago

Thanks. It's always helpful to have more data to test with if you want to send me an email or a discord DM.

mikiher commented 1 month ago

I was going off of the issue opened here YuJianrong/node-unrar.js#164

I think the author was trying to explain that the unrar wasm is not ammenable to accept a standard Javascript ReadStream. However, the unrar does accept an Extractor abstract class that defines abstract read(), seek() and tell() methods. the Extractor class has two implementations, ExtractorData (which implements those methods using a memory buffer), and ExtractorFile (which implements the methods using actual file operations).

When I say it supports streaming, what I actually mean that when it uses ExtractorFile, (I think) it does not read the whole thing into memory at once, but rather uses the read(size) and seek() methods to read what it needs directly from the input file, without needing to read all of it into memory. Or in other words, it is using random access into files (in the case of ExtractorFile) as it needs.

I haven't read the code too carefully, but I think this is how it works.

mikiher commented 1 month ago

If I can help with anything (e.g. sharing some of the big comics I used to produce the issue) I can gladly share them.

Please share with me as well.