dteviot / WebToEpub

A simple Chrome (and Firefox) Extension that converts Web Novels (and other web pages) into an EPUB.
Other
739 stars 140 forks source link

Add the option to retrieve metadata from www.wlnupdates.com or www.novelupdates.com #444

Closed Emasoft closed 3 months ago

Emasoft commented 3 years ago

Can you add the option to retrieve the novel metadata (title, original title, author, original language, genres, summary, first published date, etc.) from www.wlnupdates.com or www.novelupdates.com and save/update it in the epub?

dteviot commented 3 years ago

You are most welcome to have a go at implementing something like this. I'd suggest

  1. Under the Advanced tab, add two input fields, to take the URLs for wlnupdates and novelupdates.
  2. When one of these is supplied, fetch the page, extract the elements and construct an information page. The place too hook would be where the information page is normally created. https://github.com/dteviot/WebToEpub/blob/acef9ba572c05ca4bfc3478e86e52316491dc54b/plugin/js/Parser.js#L291-L295
Emasoft commented 3 years ago

The parser seems missing some important pieces of informations, like original title, original author name, genre keywords, etc. You just need to get the following epub metadata from the novel web page on novelupdates.com or wlnupdates.com. Here is a real world example of the epub output:

<dc:identifier id="book-id" opf:scheme="ISBN">1234567890X</dc:identifier> 
    <dc:title id="english">Battle Through the Heavens</dc:title>
    <meta refines="#english" property="title-type">english title</meta>
    <dc:title id="original">斗破苍穹</dc:title>
    <meta refines="#original" property="title-type">original title</meta>
    <dc:title id="alternative">Fights Break Sphere</dc:title>
    <meta refines="#alternative" property="title-type">alternative title</meta>
     <dc:language id="text-language">en</dc:language>
    <meta refines="#text-language" property="identifier-type" scheme="onix:codelist22">01</meta>
    <dc:language id="original-language">cn</dc:language>
    <meta refines="#original-language" property="identifier-type" scheme="onix:codelist22">02</meta>
    <dc:creator opf:role="aut" >Heavenly Silkworm Potato</dc:creator>
    <dc:creator opf:role="aut" >Tian Can Tu Dou</dc:creator>
    <dc:creator opf:role="aut" >天蚕土豆</dc:creator>
    <dc:creator opf:role="trl" >GravityTales</dc:creator>
    <dc:contributor opf:role="ill" >Hongbin Zhou</dc:contributor>
    <dc:publisher>Qidian</dc:publisher>
    <dc:subject>Action</dc:subject>
    <dc:subject>Adventure</dc:subject>
    <dc:subject>Fantasy</dc:subject>
    <dc:subject>Harem</dc:subject>
    <dc:subject>Martial Arts</dc:subject>
    <dc:subject>Xuanhuan</dc:subject>
    <dc:date opf:event="publication">2018-01-01T00:00:00Z</dc:date>
    <dc:source>urn:isbn:1234567890X</dc:source>
    <dc:description>"In a land where no magic is present. A land where the strong make the rules and the weak have to obey. A land filled with alluring treasures and beauty, yet also filled with unforeseen danger. Three years ago, Xiao Yan, who had shown talents none had seen in decades, suddenly lost everything. His powers, his reputation, and his promise to his mother. What sorcery has caused him to lose all of his powers? And why has his fiancee suddenly shown up? </dc:description>
 <link href="https://www.novelupdates.com/series/battle-through-the-heavens/" />
  </metadata> 

All the metadata in the example above is taken from Novelupdates. An extensive API is provided on both websites. You can check wlnupdates.com API here:

https://github.com/fake-name/wlnupdates/blob/master/app/templates/api-docs.md

dteviot commented 3 years ago

@Emasoft Thank you for the winupdates API. I'm not sure how I would explain to someone that if the supply the winnovel series-id, more metatdata can be populated. Also, to explain to them how to get the series-id.

Can you provide similar documentation for NovelUpdates? Or was your example created by scraping https://www.novelupdates.com/series/battle-through-the-heavens/ ?

Note, a quick Google for NovelUpdates API turns up threads like this: https://forum.novelupdates.com/threads/public-rest-api.47162/

The parser seems missing some important pieces of information,

Respectifully, I disagree with you. This information isn't missing. It was intentionally left out. Because:

  1. Most people don't care about it
  2. Trying to figure out how to get it all for each site would make each parses take several hours more to write.. And I simply don't have the time. Plus it makes it more difficult for anyone else to write one, hence, they're even less likely to do it. (At this point in time, I think only 2 other people have contributed a parser, and I'm not sure there was a second.)

Now I fully agree that fetching the information from NovelUpdate and putting it into the metadata is a good idea. (I've even considered doing this for the "information" page that WebToEpub tries to generate.) But, I simply don't have the time to do every good/nice idea. Well, not if also want to READ the epubs I'm collecting. So, If you'd like to do this, or even just supply the code to scrape NovelUpdate, that would be a big help.

gamebeaker commented 3 years ago

@dteviot Hi here my solution for the additional metadata. I only regarded tags and description. I have made changes to popup.html, messages.json, main.js, EpubPacker.js following are the changes i try to upload the changed files. webtoepub.zip

popup.html line 145 //////////////////////////////////////////////////

                <table id="AdditionalMetadatatable">
                    <tr>
                        <td>__MSG_label_Metadata_URL__</td>
                        <td><input id="metadataUrlInput" type="text" name="metadataUrlInput" />
                        <button id="loadMetadataButton">__MSG_button_load_Metadata__</button></td>
                    </tr>
                    <tr>
                        <td><input id="lesstags" type="checkbox" name="lesstags" checked="true" name="less tags"></td>
                        <td>__MSG_label_less_tags__</td>
                    </tr>
                    <tr>
                        <td>__MSG_label_Metadata_subject__</td>                        
                        <td><textarea rows="2" cols="60" id="subjectInput" type="text" name="subjectInput"></textarea></td>
                    </tr>
                    <tr>
                        <td>__MSG_label_Metadata_description__</td>
                        <td><textarea rows="2" cols="60" id="descriptionInput" type="text" name="descriptionInput"></textarea></td>
                    </tr>
                </table>

//////////////////////////////////////////////

_locales -> en -> message.json line 2 ///////////////////////////////////////////// "MSG_label_Metadata_URL": { "message": "Additional Metadata URL", "description": "Additional Metadata URL" }, "MSG_label_Metadata_subject": { "message": "Tags", "description": "Tags for epub" }, "MSG_label_Metadata_description": { "message": "Epub description", "description": "Preview description from the epub in Calibre" }, "MSG_button_load_Metadata": { "message": "Load Additional Metadata", "description": "Label on button to toggle Load and Analyse Additional Metadata" }, "MSG_button_show_Additional_Metadata": { "message": "Additional Metadata", "description": "Label on button to toggle Load and Analyse Additional Metadata" }, "__MSG_label_less_tags__": { "message": "less tags", "description": "Only extract genre" },

//////////////////////////////////////////////

main.js line 97 (function metaInfoFromControls() ) ///////////////////////////////////////////// metaInfo.subject = getValueFromUiField("subjectInput"); metaInfo.description = getValueFromUiField("descriptionInput");

/////////////////////////////////////////////

main.js line 412 ( function addEventHandlers() ) //////////////////////////////////////////// document.getElementById("loadMetadataButton").onclick = onLoadMetadataButtonClick;

///////////////////////////////////////////

main.js where you want but in main function

///////////////////////////////////////////////////////////////////////////////////////////////// //Aditional Metadata function onLoadMetadataButtonClick(){ // load page via XmlHTTPRequest let url = getValueFromUiField("metadataUrlInput"); return HttpClient.wrapFetch(url).then(function (xhr) { populateMetadataAddWithDom(url, xhr.responseXML); }).catch(function (error) { getLoadAndAnalyseButton().disabled = false; ErrorLog.showErrorMessage(error); }); }

function populateMetadataAddWithDom(url, dom) {
    // set the base tag, in case server did not supply it 
    util.setBaseTag(url, dom);
    try {
            let metaAddInfo = getEpubMetaAddInfo(dom, userPreferences.useFullTitle.value, url);
            setUiFieldToValue("subjectInput", metaAddInfo.subject);
            setUiFieldToValue("descriptionInput", metaAddInfo.description);
        } catch (error) {
            ErrorLog.showErrorMessage(error);
        }
}

function getEpubMetaAddInfo(dom, useFullTitle, url){
    let that = this;
    let metaAddInfo = new EpubAddMetaInfo();
    //novelupdates
    if (url.includes("novelupdates.com") == true){
    metaAddInfo.subject = Addsubjectnovelupdate(dom);
    metaAddInfo.description = Adddescriptionnovelupdate(dom);

    //wlnupdates
    }else{if(url.includes("wlnupdates.com") == true){
    metaAddInfo.subject = Addsubjectwlnupdates(dom);
    metaAddInfo.description = Adddescriptionwlnupdates(dom);
    }else{
        var test = "Error: Fetch of URL '" + url + "' failed to fetch please check if website is novelupdates.com or wlnupdates.com.";
        ErrorLog.showErrorMessage(test);
    }}
    return metaAddInfo;
}

class EpubAddMetaInfo {
constructor () {
    this.subject = chrome.i18n.getMessage("defaultsubject");
    this.description = chrome.i18n.getMessage("defaultdescription");

}}
//novelupdate
//Analyse subject from novelupdate
function Addsubjectnovelupdate(dom){
    //fetch
    //Genre
    var x = dom.getElementById("seriesgenre").getElementsByClassName("genre")[0].innerHTML;
    for ( var i = 1; dom.getElementById("seriesgenre").getElementsByClassName("genre")[i] !== undefined; i++ ){
    x +=", ";
    x += dom.getElementById("seriesgenre").getElementsByClassName("genre")[i].innerHTML;
    }
    //Tags
    //Test if less tags
    if (document.getElementById("lesstags").checked == false){
    for ( var i = 0; dom.getElementById("showtags").getElementsByClassName("genre")[i] !== undefined; i++ ){
    x +=", ";
    x += dom.getElementById("showtags").getElementsByClassName("genre")[i].innerHTML;
    }}
    return x;
}
//Analyse description from novelupdate
function Adddescriptionnovelupdate(dom){
    var x = dom.getElementById("editdescription").getElementsByTagName('p')[0].innerHTML;
    return x;
}
//wlnupdates
//Analyse subject from wlnupdates
function Addsubjectwlnupdates(dom){
    //fetch
    //Genre 
    var x = dom.getElementById("genre-container").getElementsByClassName("multiitem")[0].getElementsByTagName('a')[0].innerHTML.trim();

    for ( var i = 1; dom.getElementById("genre-container").getElementsByClassName("multiitem")[i] !== undefined; i++ ){
    x +=", ";
    x += dom.getElementById("genre-container").getElementsByClassName("multiitem")[i].getElementsByTagName('a')[0].innerHTML.trim();
    }
    //Tags
    //Test if less tags
    if (document.getElementById("lesstags").checked == false){
    for ( var i = 0; dom.getElementById("tag").getElementsByClassName("multiitem")[i] !== undefined; i++ ){
    x +=", ";
    x += dom.getElementById("tag").getElementsByClassName("multiitem")[i].getElementsByTagName('a')[0].innerHTML;
    }}
    return x;
}
//Analyse description from wlnupdates
function Adddescriptionwlnupdates(dom){
    var x = dom.getElementById("description").getElementsByClassName("description")[0].getElementsByTagName('p')[0].innerHTML;
    return x;
}
//////////////////////////////////////////////////////////////////////////////////////////

EpucPacker.js line 98 (buildMetaData(opf, epubItemSupplier) ) ///////////////////////////////////////////////////////////////////////////////// that.createAndAppendChildNS(metadata, dc_ns, "dc:subject", that.metaInfo.subject); that.createAndAppendChildNS(metadata, dc_ns, "dc:description", that.metaInfo.description);

//////////////////////////////////////////////////////////////////////////////////////

dteviot commented 3 years ago

@gamebeaker In future, please submit as a pull request. https://opensource.com/article/19/7/create-pull-request-github

A pull request

  1. less error prone than me manually merging files.
  2. Takes a lot less time for me (and I'm always time constrained)
  3. Your work will show up in the history. So gives you credit for your effort.

That said, I've added your commit, but did notice the following:

  1. MSG_button_show_Additional_Metadata is not used
  2. defaultsubject and defaultdescription have no entry in messages
  3. You're using tabs, not spaces for indentation. (That's purely a style thing, but gives problems with indentation..)
  4. For Novel Updates, you're using the first <p> element as the description. The description may span multiple <p> elements.
  5. Using querySelector() and querySelectorAll() simplifies the logic to extract metadata.
  6. Use textContent, not innerHTML. (innerHTML upsets the Google and Firefox security scanners.)
  7. Don't add subject or description when they're empty
dteviot commented 3 years ago

@gamebeaker I just realized that when I added your code to WebToEpub I forgot to add your name to the credits. Please accept my most sincere apologies. (I note you rectified this in your pull request.)

gamebeaker commented 3 months ago

is implemented Additional Metadata URL