Closed yvesmotteux closed 5 months ago
I've put in a few nitpick comments, but for the most part this looks good.
I tried downloading a few stories at random (after removing the setCoverImage()
call) with very mixed results.
The first story URL (https://fanfictions.fr/fanfictions/naruto/232_crise/chapters.html) I tried with this fails to download chapters. But then, it's trying to download zip files for chapters in the browser for me, too. And the zip files apparently contain the chapter in a txt file.
The second one (https://fanfictions.fr/fanfictions/neon-genesis-evangelion/22_actes-de-bont-eacute/61_une-fanfiction-de-random1377/lire.html) I tried gives a 404 error.
The third one I tried did work. (https://fanfictions.fr/fanfictions/mai-hime/400_guren/1257_chapitre-1/lire.html)
Is this representative of the site??
Zipped txt chapters
The story with zipped txt chapters is quite old, perhaps those are a rare minority?
The only way I see off hand to detect these chapters is to look for the redirect to the zip file with something like:
def getChapterText(self, url):
logger.debug('Getting chapter text from: %s' % url)
(data,rurl) = self.get_request_redirected(url)
## telecharger_pdf.html seems to indicate redirect to download
## zipped txt file instead of HTML
if 'telecharger_pdf.html' in rurl:
raise exceptions.FailedToDownload("Error downloading Chapter: %s! FanFicFare requires HTML chapters" % url)
soup = self.make_soup(data)
div_content = soup.find('div', id='readarea')
if div_content is None:
raise exceptions.FailedToDownload("Error downloading Chapter: %s! Missing required element!" % url)
return self.utf8FromSoup(url, div_content)
404 'suspended' stories
For the 404 error I saw above, that story also reports Cette fanfiction est suspendue!
Ideally, the adapter will detect that and raise a FailedToDownload
exception.
Unrelated to prior comments:
datePublished
It looks like it should be possible to get the datePublished
metadata from a hidden data-date
attribute:
<span class="date-distance" data-date="2024-04-12 19:21:54">il y a 21 minutes</span>
basic cache
It would also be nice to include a [fanfictions.fr]
section in defaults.ini
and plugin-defaults.ini
. I don't see any reason this adapter couldn't use use_basic_cache:true
:
[fanfictions.fr]
use_basic_cache:true
Okay, all the things you've said should be okay now.
Thanks a lot for pointing out the fanfics unavailable as web pages and only downloadable as a zip (didn't know they existed) and also thanks for telling me about the suspended
fics, I didn't know them either.
I have one thing i didn't manage to properly implement: When I download a fanfic like https://fanfictions.fr/fanfictions/naruto/232_crise/chapters.html , the summary has no linebreaks at all :
I think it comes from the setDescription() function. What shoul I do to make it cleaner?
zip chapters
Unzipping the text chapters and using them is ambitious, but I won't object. I would suggest putting <br>\r\n
for line breaks--we've seen some book readers object to extremely long lines without line breaks.
description formatting
I think it comes from the setDescription() function. What should I do to make it cleaner?
Pass setDescription()
the tag without calling stripHTML()
on it first. There's a setting (keep_summary_html
) implemented in setDescription()
that lets users choose to keep/strip HTML from the description if they want. It defaults to true
for all formats except txt.
More metadata It looks like there's some more metadata that could be collected. Sorry to bring up more at the last minute--it's not strictly necessary, but in my experience, users will ask for them quickly.
language
not langcode
. Either Français
or French
will work. Otherwise the code will be set, but Language will not appear on the title page.En cours
or Terminée
(and One-shot
) from the <p class="card-text" title="Statut de la fanfiction">
tag to set FFF's status
metadata entry.In-Progress
for En cours
and Completed
for Terminée
and One-shot
. Some adapters set other status
values, but the Calibre plugin version gives special processing for Completed
and In-Progress
for boolean custom columns.genre
could be collected from the <div class="col-auto text-right text-truncate" title="Format et genres">
tag.category
could be collected from the breadcrumbs tags.I've broken the test on description and I'm struggling to fix it, hopefully it'll be okay tomorrow.
I'm also adding the missing metadata. What do you mean by the category
one? What did you have in mind?
FanFicFare's category
metadata entry is where a story's Fandom(s) is recorded. 'Harry Potter', 'Naruto', 'Star Trek', 'Good Omens', etc. FFF doesn't generally record 'Books', 'Manga', 'TV Series', etc.
The site does have crossover stories with more than one category
such as:
https://fanfictions.fr/fanfictions/harry-potter/15224_crime-frigorifique-au-4-privet-drive/chapters.html
The <nav aria-label="breadcrumb">
tag set contains the category
aka fandom. I'd probably pull the text from each <a>
tag in the fourth <li>
tag. Use self.story.addToList('category',value)
for each (or extendList()
for a list) rather than concatenating them as one string. Same with genre
.
adapter_fanfiktionde.py
might be a good example to look at.
Here we go!
My fix for the description test is rather bad but I didn't find any easier way (I had problems comparing the BeautifulSoup object passed to setDescription
and a string of HTML.
Great, thanks!
Test versions up in the usual places.
This PR adds a new connector for the fanfictions.fr website.
It's a french fanfic website. Related topic on their forum : https://forum.fanfictions.fr/t/integration-fanficfare/5936/3 If this PR is already merged as you read these lines, feel free to raise any issue there!