FicHub / fichub.net

web frontend for generating ebooks from fanfic
https://fichub.net
GNU Affero General Public License v3.0
45 stars 2 forks source link

Random Spaces in Epub Text #6

Open oberon0470 opened 3 years ago

oberon0470 commented 3 years ago

Not a big issue mostly cosmetic but it randomly adds a space in a word throughout the epub. Have compared epubs when downloaded with another app and spaces were not there. Example from an epub ("Convicted by Transwarp") I recently downloaded from 'FanFiction.net': "C ontinued in Chapter Four teen". I did notice that it appears to happen when there is some special property to the text (bold or italicized for example). I checked and it does the same thing to epubs from 'Archive of Our Own'.

iridescent-beacon commented 3 years ago

Thank you for the report; this is almost certainly a side effect of the legacy html cleanup code (which in fact originally didn't output html at all and was retrofitted). There's some TODO work slated to replace it. I'll leave this issue open and confirm it's fixed once that work is completed.

andreas-kupries commented 3 years ago

A question here. At AO3 I can directly download the epub for a story. Could fichub make use of that ? Instead of creating its own ? Not sure how big a special case that would be in the backend. (The unsupported tthanfic (#4) can directly deliver epub as well)

iridescent-beacon commented 3 years ago

At AO3 I can directly download the epub for a story. Could fichub make use of that ?

It probably could as long as all the metadata is also in their epubs, or failing that at least use it to grab content from.

The content would probably still need to be run through the same sanitizer though, so it wouldn't really help the spaces issue.

Not sure how big a special case that would be in the backend.

Currently there's no epub reading code in the backend so it'd be a new codepath. For AO3, assuming things keep working, it doesn't seem worth the effort. For new sites like tthfanfic it may be worth investigating.

andreas-kupries commented 3 years ago

At AO3 I can directly download the epub for a story. Could fichub make use of that ?

It probably could as long as all the metadata is also in their epubs, or failing that at least use it to grab content from.

Oh, you understood my proposal as get the epub, and pull the meta data out of it. I was more thinking to get the meta data as usual, and only when it comes to getting the story content itself to pull the epub directly.

Checking an epub I have from AO3 and easily accessible ... The Atril doc viewer can show me author(s) and title through the Properties menu entry, so that seems to be stored somewhere in the epub.

The content would probably still need to be run through the same sanitizer though, so it wouldn't really help the spaces issue.

Hm. Ok, I guess I am more trusting of the output of AO3 here.

Not sure how big a special case that would be in the backend.

Currently there's no epub reading code in the backend so it'd be a new codepath. For AO3, assuming things keep working, it doesn't seem worth the effort. For new sites like tthfanfic it may be worth investigating.

Ok.

iridescent-beacon commented 3 years ago

The content would probably still need to be run through the same sanitizer though, so it wouldn't really help the spaces issue.

Hm. Ok, I guess I am more trusting of the output of AO3 here.

It's possible I'll whitelist certain sites eventually, but it's a much simpler mental model for me if I know that content I'm sending to users is heavily restricted in certain ways. Something to think about, anyway.

andreas-kupries commented 3 years ago

The content would probably still need to be run through the same sanitizer though, so it wouldn't really help the spaces issue.

Hm. Ok, I guess I am more trusting of the output of AO3 here.

It's possible I'll whitelist certain sites eventually, but it's a much simpler mental model for me if I know that content I'm sending to users is heavily restricted in certain ways. Something to think about, anyway.

Good point, yes. Security, another layer of it. The difference between something used for and by oneself, versus something used by many people. Yes, I am retracting this idea. Thank you for convincing me that this was not really thought out from my side.

iridescent-beacon commented 3 years ago

It's possible I'll whitelist certain sites eventually, but it's a much simpler mental model for me if I know that content I'm sending to users is heavily restricted in certain ways. Something to think about, anyway.

Good point, yes. Security, another layer of it. The difference between something used for and by oneself, versus something used by many people.

Precisely :) I can certainly see how access to a less filtered view of the origin content is useful for people who know what they're doing (devs), but there are other avenues those users can explore.

As this issue shows there will certainly be bugs and it can be a pain to rely on someone else to fix them, but I believe it's the better option for most users.

Yes, I am retracting this idea. Thank you for convincing me that this was not really thought out from my side.

Thank you for bringing it up! I'm sure there are lots of things I have missed that could be done better or more efficiently and the only way to be certain is by communicating.