dipu-bd / lightnovel-crawler

Generate and download e-books from online sources.
https://pypi.org/project/lightnovel-crawler/
GNU General Public License v3.0
1.44k stars 282 forks source link

Cleaner blacklist_patterns #1646

Closed idMysteries closed 4 weeks ago

idMysteries commented 1 year ago
self.cleaner.blacklist_patterns.update([
            "Prev", "ToC", "Next"
        ])

do I understand correctly that if there is a "Next" in the text, then it will delete this text? It's a terrible thing.

dipu-bd commented 1 year ago
self.cleaner.bad_tag_text_pairs.update(
            {
                'a': r"(PREVIOUS CHAPTER)|(CHAPTER LIST)|(NEXT CHAPTER)",
                'p': r"(FOLLOW / LIKE / SUBSCRIBE)|(FOLLOW AND LIKE THIS BLOG)" 
                      +"|(SUBSCRIBE and LIKE)|(SUBSCRIBE AND LIKE)|(Please donate any "
                      +"amount to support our group!)|(Please donate to support our group!)",
            }
        )
idMysteries commented 1 year ago

Oh wait... I'm stupid

idMysteries commented 1 year ago

bad_tag_text_pairs

idMysteries commented 1 year ago
self.cleaner.bad_tag_text_pairs.update(
            {
                "a": r"""(PREVIOUS CHAPTER)
                |(CHAPTER LIST)
                |(NEXT CHAPTER)""",
                "p": r"""(FOLLOW / LIKE / SUBSCRIBE)
                |(FOLLOW AND LIKE THIS BLOG)
                |(SUBSCRIBE and LIKE)
                |(SUBSCRIBE AND LIKE)
                |(Please donate any amount to support our group!)
                |(Please donate to support our group!)""",
            }
        )
idMysteries commented 1 year ago

image ahahahahah i'm sooooo stupid...

idMysteries commented 1 year ago

Ammm your code also don't work

budikesuma commented 1 year ago

@idMysteries

Let's try this regex:

((PREVIOUS|NEXT) CHAPTER|CHAPTER LIST)

(FOLLOW / LIKE / SUBSCRIBE|FOLLOW AND LIKE THIS BLOG|SUBSCRIBE [Aa][Nn][Dd] LIKE|Please donate( any amount)? to support our group\!)

For the last one, if don't work, then add the backslash\ before the slash/.

(FOLLOW \/ LIKE \/ SUBSCRIBE|FOLLOW AND LIKE THIS BLOG|SUBSCRIBE [Aa][Nn][Dd] LIKE|Please donate( any amount)? to support our group\!)
dipu-bd commented 1 year ago

Ammm your code also don't work

fixed