Flameish / Novel-Grabber

Novel-Grabber can download novels from pretty much any webnovel and lightnovel site.
MIT License
487 stars 65 forks source link

add support for scraping metadata from www.wlnupdates.com #82

Closed Emasoft closed 3 years ago

Emasoft commented 3 years ago

Hi! Can you add support for scraping metadata from www.wlnupdates.com ? It has a lot better metadata than novelupdates.com. Thx!

Flameish commented 3 years ago

It's not actually just scraping the metadata from novel-updates but considers it as a supported host and follows the chapter links to the real hosts. So adding "metadata" scraping is not possible. I could add it as a new supported host through.

Same drawbacks as NU apply here aswell: 1) Sometimes the chapter links point to "pre-chapter" on the real hosts where it's basically just a short notice with a link to the full chapter. Also double chapter entries from different groups can happen. 2) The automatic chapter text detection is not perfect and unwanted copy preventions leak through.

Emasoft commented 3 years ago

You just need to get all the epub metadata from the novel web page on novelupdates.com or wlnupdates.com. Here is a real world example of an epub output:

<dc:identifier id="book-id" opf:scheme="ISBN">1234567890X</dc:identifier> 
    <dc:title id="english">Battle Through the Heavens</dc:title>
    <meta refines="#english" property="title-type">english title</meta>
    <dc:title id="original">斗破苍穹</dc:title>
    <meta refines="#original" property="title-type">original title</meta>
    <dc:title id="alternative">Fights Break Sphere</dc:title>
    <meta refines="#alternative" property="title-type">alternative title</meta>
    <dc:language id="text-language">en</dc:language>
    <meta refines="#text-language" property="identifier-type" scheme="onix:codelist22">01</meta>
    <dc:language id="original-language">cn</dc:language>
    <meta refines="#original-language" property="identifier-type" scheme="onix:codelist22">02</meta>
    <dc:creator opf:role="aut" >Heavenly Silkworm Potato</dc:creator>
    <dc:creator opf:role="aut" >Tian Can Tu Dou</dc:creator>
    <dc:creator opf:role="aut" >天蚕土豆</dc:creator>
    <dc:creator opf:role="trl" >GravityTales</dc:creator>
    <dc:contributor opf:role="ill" >Hongbin Zhou</dc:contributor>
    <dc:publisher>Qidian</dc:publisher>
    <dc:subject>Action</dc:subject>
    <dc:subject>Adventure</dc:subject>
    <dc:subject>Fantasy</dc:subject>
    <dc:subject>Harem</dc:subject>
    <dc:subject>Martial Arts</dc:subject>
    <dc:subject>Xuanhuan</dc:subject>
    <dc:date opf:event="publication">2018-01-01T00:00:00Z</dc:date>
    <dc:source>urn:isbn:1234567890X</dc:source>
    <dc:description>"In a land where no magic is present. A land where the strong make the rules and the weak have to obey. A land filled with alluring treasures and beauty, yet also filled with unforeseen danger. Three years ago, Xiao Yan, who had shown talents none had seen in decades, suddenly lost everything. His powers, his reputation, and his promise to his mother. What sorcery has caused him to lose all of his powers? And why has his fiancee suddenly shown up? </dc:description>
 <link href="https://www.novelupdates.com/series/battle-through-the-heavens/" />
  </metadata> 
Flameish commented 3 years ago

NG only supports rudimentary metadata support (Title, Author, Description, Tags). Both WLNupdates and NovelUpdates do not provide more extensive data than those as well or are servery lacking in most cases. Same goes for novels, NU only hosts translated novels and while WLNupdates also host western ones, it's just not detailed enough to implement it as an automatic metadata provider.

Emasoft commented 3 years ago

All the metadata above is taken from Novelupdates. Why do you say that is not available? An API is provided on both websites. You can check wlnupdates.com API here:

https://github.com/fake-name/wlnupdates/blob/master/app/templates/api-docs.md

Flameish commented 3 years ago

I don't think most people really care about metadata further than the cover. The most important parts are in. Publication date and ISBN are only for published novels, which 99% of "supported novels" are not. Also like I said, for most novels, except popular ones like Battle Through the Heavens which often have official EPUBs, the available data is largely incomplete or equal to that what you can get from the host sites themselves. That is if they are even on wlnUpdates/NU.

Flameish commented 3 years ago

Closed until wlnupdates is mature enough.