devnoname120 / google-play-book-downloader

Download and decrypt books purchased on Google Play Books without text reflowing
GNU Affero General Public License v3.0
54 stars 4 forks source link
dedrm downloader drm epub google-play-books pdf python

Google Play Books downloader for interoperability purposes. Each page is downloaded as an image, it's up to you to build a PDF from them, do OCR, and add metadata.

Why:

Note: this script only works for books that have the “Original Pages” viewing option.

Prerequisites

Usage (PDF download)

poetry run python google-play-book-downloader-pdf.py

You will find the downloaded book pages in the books/[BOOK_ID] folder.

Recommended next steps:

1) Optimize the resulting images:

a) Run pngquant. It does high-quality lossy compression (40-70%) on the PNG images by optimizing the color palette:

    pngquant -fv --ext=.png --skip-if-larger --speed=1 --quality=95-100 *.png

b) (Optional) Run oxipng. It does additional lossless compression (3-5%) on the PNG images produced by pngquant:

    oxipng --dir . --strip safe --interlace 0 -o 4 *.png

2) Build a PDF

This command will merge all the pages into a PDF, add metadata (book title, date, authors, etc.), and a table of contents.

Run the following command (replace [BOOK_ID] with the ID of the book):

```shell
poetry run play-book-pdf-build books/[BOOK_ID]
```

3) OCR and optimize the PDF using Adobe Acrobat Pro:

a) Open the PDF.

b) [Menu] ToolsScanOCR.

c) [Scan & OCR toolbar]Settings:

Usage (EPUB download)

There is an extremely experimental EPUB downloader in the project as well. For now it just downloads all the pages of a given book in the HTML format and embeds all the resources (images, fonts, etc.) directly in the HTML files as base64. EPUB is not reconstructed yet.

No support yet as it's experimental! Please don't open issues on GitHub regarding the EPUB downloader.

poetry run python google-play-book-downloader-epub.py

You will find the downloaded book pages as HTML in the books/[BOOK_ID]/segments folder. The output is very crude and EPUBs are not reconstructed.