exp0nge / light-novel-scraper

A small scraper tool to download the contents of light novel chapters and save them as HTML or ePUB (without the clutter). Both local and web API supported.
7 stars 3 forks source link
automation celery light-novel scraper web-novel

light-novel-scraper

LIVE DEMO

About

A scraper tool to grab contents of chapters of a light novel and store them as HTML files to read later. The script utilizes Readability, with a fallback of grabbing the content-div, to grab relevant text from a website and ebooklib to generate a ePub if necessary.

Web API utilizes Celery (with Redis) with Flask backend and AngularJS frontend.

Usage

The following will grab all the chapters from 31 to 53:

ls = LightScrap(title='Smartphone',
                start_chapter_number=31,
                end_chapter_number=53,
                url='http://raisingthedead.ninja/2015/10/06/smartphone-chapter-31/')          
ls.chapters_walk()  # Grab all the HTML files
# Or grab the chapters from the table of contents URL
ls.toc_walk('http://raisingthedead.ninja/current-projects/in-a-different-world-with-a-smartphone/')
ls.make_html_toc()  # Make a HTML table of contents file to use with Calibre
ls.generate_epub()  # You can generate a ePub, also.

Use foreman start to start the web server, Redis, and Celery.

GIF example:

Web api demo gif

Requirements

requirements.txt included.

Not Supported