JimmXinu / FanFicFare

FanFicFare is a tool for making eBooks from stories on fanfiction and other web sites.
Other
757 stars 163 forks source link

Add a fanfictions.fr connector #1061

Closed yvesmotteux closed 5 months ago

yvesmotteux commented 5 months ago

This PR adds a new connector for the fanfictions.fr website.

It's a french fanfic website. Related topic on their forum : https://forum.fanfictions.fr/t/integration-fanficfare/5936/3 If this PR is already merged as you read these lines, feel free to raise any issue there!

JimmXinu commented 5 months ago

I've put in a few nitpick comments, but for the most part this looks good.

JimmXinu commented 5 months ago

I tried downloading a few stories at random (after removing the setCoverImage() call) with very mixed results.

The first story URL (https://fanfictions.fr/fanfictions/naruto/232_crise/chapters.html) I tried with this fails to download chapters. But then, it's trying to download zip files for chapters in the browser for me, too. And the zip files apparently contain the chapter in a txt file.

The second one (https://fanfictions.fr/fanfictions/neon-genesis-evangelion/22_actes-de-bont-eacute/61_une-fanfiction-de-random1377/lire.html) I tried gives a 404 error.

The third one I tried did work. (https://fanfictions.fr/fanfictions/mai-hime/400_guren/1257_chapitre-1/lire.html)

Is this representative of the site??

JimmXinu commented 5 months ago

Zipped txt chapters

The story with zipped txt chapters is quite old, perhaps those are a rare minority?

The only way I see off hand to detect these chapters is to look for the redirect to the zip file with something like:

    def getChapterText(self, url):
        logger.debug('Getting chapter text from: %s' % url)

        (data,rurl) = self.get_request_redirected(url)
        ## telecharger_pdf.html seems to indicate redirect to download
        ## zipped txt file instead of HTML
        if 'telecharger_pdf.html' in rurl:
            raise exceptions.FailedToDownload("Error downloading Chapter: %s! FanFicFare requires HTML chapters" % url)
        soup = self.make_soup(data)

        div_content = soup.find('div', id='readarea')
        if div_content is None:
            raise exceptions.FailedToDownload("Error downloading Chapter: %s!  Missing required element!" % url)

        return self.utf8FromSoup(url, div_content)

404 'suspended' stories

For the 404 error I saw above, that story also reports Cette fanfiction est suspendue! Ideally, the adapter will detect that and raise a FailedToDownload exception.

Unrelated to prior comments:

datePublished It looks like it should be possible to get the datePublished metadata from a hidden data-date attribute: <span class="date-distance" data-date="2024-04-12 19:21:54">il y a 21 minutes</span>

basic cache It would also be nice to include a [fanfictions.fr] section in defaults.ini and plugin-defaults.ini. I don't see any reason this adapter couldn't use use_basic_cache:true:

[fanfictions.fr]
use_basic_cache:true
yvesmotteux commented 5 months ago

Okay, all the things you've said should be okay now. Thanks a lot for pointing out the fanfics unavailable as web pages and only downloadable as a zip (didn't know they existed) and also thanks for telling me about the suspended fics, I didn't know them either.

I have one thing i didn't manage to properly implement: When I download a fanfic like https://fanfictions.fr/fanfictions/naruto/232_crise/chapters.html , the summary has no linebreaks at all : image

I think it comes from the setDescription() function. What shoul I do to make it cleaner?

JimmXinu commented 5 months ago

zip chapters Unzipping the text chapters and using them is ambitious, but I won't object. I would suggest putting <br>\r\n for line breaks--we've seen some book readers object to extremely long lines without line breaks.

description formatting

I think it comes from the setDescription() function. What should I do to make it cleaner?

Pass setDescription() the tag without calling stripHTML() on it first. There's a setting (keep_summary_html) implemented in setDescription() that lets users choose to keep/strip HTML from the description if they want. It defaults to true for all formats except txt.

More metadata It looks like there's some more metadata that could be collected. Sorry to bring up more at the last minute--it's not strictly necessary, but in my experience, users will ask for them quickly.

yvesmotteux commented 5 months ago

I've broken the test on description and I'm struggling to fix it, hopefully it'll be okay tomorrow. I'm also adding the missing metadata. What do you mean by the category one? What did you have in mind?

JimmXinu commented 5 months ago

FanFicFare's category metadata entry is where a story's Fandom(s) is recorded. 'Harry Potter', 'Naruto', 'Star Trek', 'Good Omens', etc. FFF doesn't generally record 'Books', 'Manga', 'TV Series', etc.

The site does have crossover stories with more than one category such as: https://fanfictions.fr/fanfictions/harry-potter/15224_crime-frigorifique-au-4-privet-drive/chapters.html

The <nav aria-label="breadcrumb"> tag set contains the category aka fandom. I'd probably pull the text from each <a> tag in the fourth <li> tag. Use self.story.addToList('category',value) for each (or extendList() for a list) rather than concatenating them as one string. Same with genre.

adapter_fanfiktionde.py might be a good example to look at.

yvesmotteux commented 5 months ago

Here we go! My fix for the description test is rather bad but I didn't find any easier way (I had problems comparing the BeautifulSoup object passed to setDescription and a string of HTML.

JimmXinu commented 5 months ago

Great, thanks!

Test versions up in the usual places.