avluis / Hentoid

Doujinshi Android App
https://discord.gg/QEZ3qk9
Apache License 2.0
1.07k stars 84 forks source link

Support for ACBF archives #848

Closed RobbWatershed closed 3 years ago

RobbWatershed commented 3 years ago

Original post in #847

In addition, I'd also like to request that compatibility with the .acbf (advanced comic book format) be implemented. ACBF as a standard has not had wide adoption yet, unfortunately due in part to the lack of updates for the specification and applications made by the original devs (the editor that they made was still using GTK2). However I'm currently working on updating some stuff in the editor myself, and I definitely think that the format specification could use some changes, and given that it seems like the original devs have all but abandoned it, I might as well see what I can do with it.

LiftedStarfish commented 3 years ago

[I forgot to create the my repo. I'll do that in a moment.] [This has been added after the stuff bellow. Just a quick clarification, unless I've misunderstood what an archive is, the ACBF specification is not an archive, rather it is an XML specification that is intended to store the comic in it's entirety, encoding text and graphic data, storing the latter as base 64 binary strings]

I'll just "briefly" explain what the Advanced Comic Book Format can do that makes it a significant improvement over other standards that I've seen. If you don't care about that info, skip ahead to the next paragraph. Note that ACBF is an extension/implementation specification of XML, so if there's any significant downsides to XML that I don't know about, please let me know. The most significant improvement (to me) that ACBF offers (aside from the content all being stored as one file, and not the hack that is archives), is that it can store graphic and textual data separately, and it's encouraged, because it allows for a lot of streamlining, such as deduplication of translated works, while simultaniously making the work of translators much easier. As somebody who's translated a comic from English to Esperanto, and can confidently say that translations are a pain when you have to work directly with the image itself, and that was still euro->pseudoeuro. I can't imagine what a nightmare it is when trying to map horizontal text to space that was originally intended for vertical text. When all you need to worry about is what the text says, because the program itself takes care of everything else, it makes a translator's job much easier. Not to mention, it allows for font changing.

The current implementations by the original team (I think, there's not a lot of content on it) and source code are over on launchpad (warning, might not be HTTPS) . The specification is still kind of bare bones, and unfortunately the webpage that I just gave you is an actual nightmare to navigate (it literally took me hours to figure out where to download the source code for the editor or the viewer, though that could just be my newbie status). Fortunately, there's a much easier to understand fandom/wikia page for it, so here's a link to that here. I'll also host a mirror if the original viewer and editor desktop applications on my github, but they'll be updated with my own modifications as soon as I have it functional on my own machine (Manjaro KDE).

The guy made it back in 2011, and unfortunately hasn't done a whole lot with it. Again, the editor hadn't even been updated for modern operating systems, and combing through the viewer indicates to me that, while the app does open, there's a lot of bugs and errors that aren't immediately evident (I manually made a meta-info file for a .cbz archive that the viewer would not read, though it could be improper formatting, I'm pretty sure that I followed the specification.) Their priorities are also confusing. The viewer is on version 2.10, whereas the editor, which shares a significant portion of it's code with the viewer, indicating that it was likely developed concurrently (and imported libraries also suggest the same) is only on version 1.17, seemingly not having been touched for many years.

I'm well aware that implementing such a format would be a much greater undertaking than something like implementing the ability to read or even package CBAs, and I don't think it would be a stretch to classify ACBF readability as a 'major feature' in terms of scope. I'd say that it's more accurate to say that my request isn't as much "Can you guys implement this thing" as it is "Hey, if I managed to get this thing working the way that it's supposed to, would you actually be willing to accept a pull request?" or even "Would you be willing to familiarize yourself with this (arguably) greatly improved specification for digitized comic books?"

Sorry for being rambly. I just kind of stumbled across the ACBF specification while searching for a standardized way of representing comic-book metadata, thought that it was really cool and got excited. It actually feels like a proper digitized comic-book format, not just "I put a book in a computer", because the current systems are functionally no different than if we were to digitize physical books by photocopying their pages. There are actually other features that I didn't mention above, such as the ability to define frames/panels and view them one at a time (could be useful for something like webtoons). Please tell me what you think.

LiftedStarfish commented 3 years ago

Sorry, launchpad and bazaar are so archaic in appearance, and I had problems understanding the UI, but here's the actual XSD files which I think is used to define the specification (which is apparently on v1.1).

LiftedStarfish commented 3 years ago

I've just learned that one of the significant downsides of ACBF is that, I've heard that encoding images as a base64 binary as intended in the specification tends to bloat the actual physical (on disk) size of the image by about 33%, which is not very economical. I can see now that if nothing else, I need to work out a better way of encoding the images that won't fuck up storage.

RobbWatershed commented 3 years ago

As I understand what it does, Hentoid will have to handle the display and formatting of text over text-free pictures. That clearly falls into the category of major updates, considering the amount of work needed to implement that.

What's more, correct me if I'm wrong, but you're asking for support for a format developed by one single guy in one single app that was abandoned in 2011 ?

LiftedStarfish commented 3 years ago

I was actually mistaken about that. He is developing it still, but it appears to be sporatic. Also, according to the webpage, there is a ACBF-Development-Team, but it only has two 'active' members, and the last time Kubik (the team owner/lead) made a revision to the repo was in January of this year, with the latest merge proposal (pull request) was in April, appearing to have. I think it's generous to say that it's dying.

I realize now that I'm asking something that's arguably impossible until the format itself has more active development. I think I'll just make a hard fork, work on improving it, and when I've got a format that actually works, I'll come back and open a new issue. A brief look at file-extension.info (which doesn't even list ACBF as a format) indicates to me that there still isn't a format or specification that has the kind of behavior that ACBF is supposed to. Until I do that though, I think this particular issue should be closed. If I do get my own fork up and running, and working the way that it should, it's gonna be a long time, when I've got more experience, and I'll likely have called it something else by then. I just hope I'll be able to have a bigger impact than Kubik did, and that the specification will be able to pass the bus test.

RobbWatershed commented 3 years ago

Agreed. Thanks for confirming 😄

And good luck on your project of improving ACBF !

LiftedStarfish commented 3 years ago

In the process of doing a significant amount of research, I've actually discovered that my understanding of file formats is actually severely lacking, and that the ACBF specification (and whatever derivative I had intended create and maintain), and specifically it's "put all of the data, including image data into an XML file" might actually be a terrible idea.

You see, up until now, my knowledge of container files (.mkv, .mp4, etc.) was very limited, and I thought that most files were either raw binary data, or something like a the ACBF (one file written in one language with all data contained within said file). This was horribly naive of me, as it is not correct. It turns out that most files formats are containerized to some degree. All of the OpenDocument files (odt, ods, etc.) are actually zip files containing a bunch of xml files and even other folders. Even Microsoft Office formats (.docx, etc) are actually based on those formats. Moreover, while the OpenDocument specification does allow for storing everything in a single XML file, it's excedingly rare because it isn't compressed.

HOWEVER, this is just one flaw that ACBF (I'll be calling my fork "true digital comic book", or TDCB) had. The guy who made it still had some really good ideas, and there is still little to no standardization for comic book meta info, so I'll still be working on expanding Kubik's ideas and standardizing metadata, while also keeping the data compressed. Also, while this does essentially make this just another kind of comic book archive, I won't be trying to modify the existing way that the .cbz format is used. Every reader capable of reading a .cbz file reads it a particular way, a way that my specification is not compatible with, so I'll instead be using my .tdcb extension for the sake of keeping .cbz files compatible with older or unmaintained readers. Also for vanity's sake.