dteviot / WebToEpub

A simple Chrome (and Firefox) Extension that converts Web Novels (and other web pages) into an EPUB.
Other
713 stars 135 forks source link

Automatically change Novelupdates Censored Tags #1392

Closed X-Xadro closed 2 months ago

X-Xadro commented 3 months ago

I often use WebToEpub to download novels and use the 'Load Additional Metadata' function to add description and Tags to the epub.

But for a couple years now Novelupdates has censored a whole bunch of tags with , e.g; `Saves` and many more.

I wonder if there is a way for WebToEpub to automatically swap the censored tags with the correct ones.

And another small issue with the 'Epub Description' is that when loading metadata from Novelupdates is that it doesn't show the break between paragraphs when you use the 'Load Additional Metadata' function but it does show the break when you manually copy the text from Novelupdates and paste it in the 'Epub Description' box. As seen in the screenshots provided.

2024-07-24 15_00_18-WebToEpub - Brave 2024-07-24 15_00_41-WebToEpub - Brave

Both aren't huge issues but it would still be nice if it could be fixed/changed.

dteviot commented 3 months ago

@X-Xadro

  1. De-censoring is possible. As the censored text is what's on the page, you'll need to give me a list of what the censored text needs to be replaced with.
    I think saves is slaves, but I have no idea what "Gu" or "view*" are supposed to be.
  2. Paragraphs in description is more tricky. What you're seeing here is conversion from HTML paragraphs to UTF8. There are several problems here. (a) UTF8 formatting depends on OS. e.g. Unix, Windows and Mac are all different.
X-Xadro commented 3 months ago

I've hunted words that are censored on Novelupdates and ugh i feel like i need to wash my eyes and brain. I've attached them in a text file.

These are the full words that are censored, if you need the each individual full tag that is censored let me know and ill make a new list.

novelupdates censored words.txt

I just want to apologize in advance, i kinda regret asking for this because i wasn't aware of the extent of degeneracy of stories listed on Novelupdates lol

Pity about the summary but not a huge issue, the fixing of tags is a bigger hassle, especially if you forget one

gamebeaker commented 3 months ago

Another solution could be to extract the href and delete the useless part because the links are not censored.

X-Xadro commented 3 months ago

A few are censored in the links, here's the full tag list btw: https://www.novelupdates.com/list-tags/

A*al is for example aal in the link

How weird and inconsistent that site is.

dteviot commented 3 months ago

@X-Xadro

I believe I have de-censoring tags working. The spacing in the description has not been done by me. And I have no plans to implement. Note, looking at the actual HTML of the NovelUpdates site, I think the space between paragraphs in description is due to the style sheet. There is no break between paragraphs in the actual HTML. And WebToEpub has limited access to the style sheet information.

Test versions for Firefox and Chrome have been uploaded to https://drive.google.com/drive/folders/1B_X2WcsaI_eg9yA-5bHJb8VeTZGKExl8?usp=sharing. Pick the one suitable for you, follow the "How to install from Source (for people who are not developers)" instructions at https://github.com/dteviot/WebToEpub/tree/ExperimentalTabMode#user-content-how-to-install-from-source-for-people-who-are-not-developers and let me know how it goes. Tested with:

For my notes: 89 minutes work

gamebeaker commented 3 months ago

@dteviot one solution for the breaks in the description would be to change line 189 in EpubMetaInfo.js return dom.querySelector("#editdescription").textContent; to return dom.querySelector("#editdescription").textContent.replace(/\n/g, "\n\n");

between <p></p> <p></p> is \n inserted one time if you want more than a simple line break you just double \n example screenshot in calibre: (https://www.novelupdates.com/series/ancient-godly-monarch/) current version: grafik updated view: grafik

dteviot commented 2 months ago

@gamebeaker I'm not sure how well that will work across different OSes.

Windows, and DOS before it, uses a pair of CR and LF characters to terminate lines. UNIX (Including Linux and FreeBSD) uses an LF character only. OS X also uses a single LF character, but the classic Mac operating system used a single CR character for line breaks.

Especially when the text may be viewed on devices on different OSes.

X-Xadro commented 2 months ago

Heya, just tested the build you linked to and it works great! Except for 2 more tags:

Prostit**es -> Prostitutes S*x Slaves -> Sex Slaves

Thank you for your hard work!

gamebeaker commented 2 months ago

@dteviot here is a stackoverflow thread that says it is \n here. I created a simple testsite here: http://webtoepub.rf.gd/test.html I changed return dom.querySelector("#editdescription").textContent; to return dom.querySelector("#editdescription").textContent.replace(/\n+/g, "\n").replace(/\n/g, "\n\n"); the first replace is there to prevent a description with two \n normally from getting to many \n i think i saw something like that but can't find a example at the moment.

dteviot commented 2 months ago

@X-Xadro

Note, If this causes problems getting approval from Google or Mozilla, it gets the chop.

@gamebeaker This isn't important enough to argue over. Code added. If anyone complains, I will blame you and remove.

D:\ToBackup\Projects\GitHub\WebToEpub\eslint

Test versions for Firefox and Chrome have been uploaded to https://drive.google.com/drive/folders/1B_X2WcsaI_eg9yA-5bHJb8VeTZGKExl8?usp=sharing. Pick the one suitable for you, follow the "How to install from Source (for people who are not developers)" instructions at https://github.com/dteviot/WebToEpub/tree/ExperimentalTabMode#user-content-how-to-install-from-source-for-people-who-are-not-developers and let me know how it goes. Tested with:

For my notes: 23 minutes work

X-Xadro commented 2 months ago

URL: https://www.novelupdates.com/series/im-a-bastard-but-youre-worse/

2024-07-30 11_31_23-WebToEpub — Mozilla Firefox

I think it take because both words are normally censored S*x S*aves

The line breaks on the Summary/Description work great (on my machines at least) so thank you for that!

If Google or Mozilla do disapprove it, would it be easy to manually add it in by just adding the part in EpubMetaInfo.js?

dteviot commented 2 months ago

@X-Xadro

You didn't tell me it was "S*x S*aves". You said "S*x Slaves", which is what I tested.

If Google or Mozilla do disapprove it, would it be easy to manually add it in by just adding the part in EpubMetaInfo.js?

Yes.

Test versions for Firefox and Chrome have been uploaded to https://drive.google.com/drive/folders/1B_X2WcsaI_eg9yA-5bHJb8VeTZGKExl8?usp=sharing. Pick the one suitable for you, follow the "How to install from Source (for people who are not developers)" instructions at https://github.com/dteviot/WebToEpub/tree/ExperimentalTabMode#user-content-how-to-install-from-source-for-people-who-are-not-developers and let me know how it goes. Tested with:

For my notes: 11 minutes work

dteviot commented 2 months ago

Updated version (0.0.0.167) has been submitted to Firefox and Chrome stores. Firefox version is available now. Chrome might be available in a few hours to 21 days.