clach04 / whatabagacack

❓👜💩❗ Experimental (incomplete) Python Wallabag API Server
GNU Affero General Public License v3.0
1 stars 0 forks source link

❓👜💩❗ What A Bag A Cack

Home page https://github.com/clach04/whatabagacack

If you stumbled on this project in relation to Wallabag, don't use this! Use Wallabag instead. See alternatives section.

Experimental (incomplete) Wallabag API Server that runs under Python 3 and 2. Runs under Microsoft Windows and Linux (expected to run under Mac, but untested).

Uses https://github.com/clach04/w2d for epub generation

Table of contents generated with markdown-toc

Overview

What a 👜 of 💩.

Remotely similar to https://github.com/clach04/fake-shaarli-server make something looking like something else, a bridge/gateway/proxy.

Aim to support a subset of the Wallabag REST API used by:

Status

resources

Test / Sample URLs

https://en.wikipedia.org/wiki/EPUB
https://en.wikipedia.org/wiki/Fb2
https://en.wikipedia.org/wiki/FBReader
https://en.wikipedia.org/wiki/Web_scrape
https://en.wikipedia.org/wiki/Archive.org
https://en.wikipedia.org/wiki/Swamp_wallaby
https://en.wikipedia.org/wiki/Laser_Chess
https://en.wikipedia.org/wiki/Lazer_Maze
https://en.wikipedia.org/wiki/Deflektor
https://en.wikipedia.org/wiki/Atomic_chess
https://en.wikipedia.org/wiki/Stratomic

https://en.wikipedia.org/wiki/Chess_piece
https://en.wikipedia.org/wiki/Chess_symbols_in_Unicode
  1. NOTE - the later 2 cause problems for pypub3 - https://github.com/imgurbot12/pypub (which is why pypub3 is not used).
  2. NOTE - pypub as of 2023-07-30 can't handle Wikipedia (style) href links correctly.

What is this good for?

Usage

Quick Start Server

Also see:

Install bare minimum / recommend dependencies:

python -m pip install -e git+https://github.com/clach04/w2d.git#egg=w2d
# manually install pandoc https://pandoc.org/installing.html
sudo apt-get install
# install / run Postlight (nee Mercury) Parser web API (locally) from https://github.com/HenryQW/mercury-parser-api
docker run -p 3000:3000 -d wangqiru/mercury-parser-api

Scrape and launch server

mkdir archived_sites
cd archived_sites
python ../web2epub.py https://en.wikipedia.org/wiki/EPUB
python ../whatabagacack.py

## demo read meta data
curl http://localhost:8000/api/entries

Usage Server

set WEB_SITE_DATABASE=C:\code\py\w2d\web2epub.sqlite3
set WEB_EPUB_DIRECTORY=C:\code\py\w2d\

export WEB_SITE_DATABASE=/code/py/w2d/web2epub.sqlite3
export WEB_EPUB_DIRECTORY=/code/py/w2d

python whatabagacack.py

NOTE no longer expects entries.json to exist and be in correct format, defaults to sqlite3 database. Code still present for json entries. Override file name with operating system environment variable WEB_SITE_METADATA_FILENAME to pathname of json file.

DEBUG note, set operating system environment variable OVERRIDE_EPUB_FILENAME to full pathname to an epub to always return that one file.

Usage Dumb Scraper

set WEB_SITE_DATABASE=C:\code\py\w2d\web2epub.sqlite3
export WEB_SITE_DATABASE=/code/py/w2d/web2epub.sqlite3

python web2epub.py [list of urls]

Example:

python web2epub.py https://en.wikipedia.org/wiki/EPUB
# NOTE requires manually copy/pasting json into entries.json (or some other name)

Alternatives

Content Saving

URL / Bookmarking

Client Tools and Browser Extensions