RichardJRL / pocketmagstopdf

Download pocketmags magazines in PDF format from the HTML5 reader
Creative Commons Zero v1.0 Universal
42 stars 8 forks source link

Download pocketmags magazines in PDF format from the HTML5 reader.

PLEASE USE THIS SCRIPT RESPONSIBLY. THE MAGAZINE PUBLISHING INDUSTRY RELIES HEAVILY ON INCOME FROM SALES WITH VERY SLIM PROFIT MARGINS.

Acknowledgements:

This is a modified version of the GitHub Gist called pmdown.py written by the GitHub user rjw57. I would have contributed my changes to the original but alas it is only a Gist, not a GitHub Repository.

With thanks to:

NB: I have only been able to test this on the small number of magazines I have purchased on pocketmags.com

Feature Additions:

14/07/2022

Usage:

pocketmagstopdf.py (-h | --help)
pocketmagstopdf.py [options] <pdf> <url>

Options:

-h, --help                  Print brief usage summary.

--quality=QUALITY           Set magazine download quality.
                            Choose from extralow, low, mid, high, extrahigh or original. (Optional)
                            [default: mid]

--dpi=DPI                   Set image resolution in dots per inch. (Optional)
                            Not used with '--quality=original'.
                            [default: 150]

--title=TITLE               Set magazine title in the PDF metadata. (Optional)
                            Not used with '--quality=original'.
                            default value is the filename with;
                                - underscores replaced with spaces
                                - the file extension removed

--range-from=PAGE-FROM      Define a portion of the magazine to download, starting from this page number. (Optional)
                            Downloads from the beginning of the magazine - page 1 - if absent.
                            [default: 1]

--range-to=PAGE-TO          Define a portion of the magazine to download, ending on this page number. (Optional)
                            Downloads to the end of the magazine if absent.
                            [default: 999]

--delay=DELAY               Set the time in seconds to wait between downloading each page of the magazine. (Optional)
                            There is no delay if absent. The value of the delay may be integer or decimal.
                            Used both whenenever probing for the last valid page number of the magazine and
                            between downloading each individual page for all quality settings except 'original'.
                            [default: 0]

--save-images               Save the downloaded JPEG images of the magazine pages to a subdirectory with the same
                            name as the magazine in addition to generating the PDF of the magazine.
                            Not used with '--quality=original'.
                            [default: False]

--image-subdir-prefix=PFX   If --save-images=yes then prefix name of the subdirectory the images are saved to with
                            this string. Blank by default. (Optional)
                            Not used with '--quality=original'.
                            [default: ]

--image-subdir-suffix=SFX   If --save-images=yes then suffix name of the subdirectory the images are saved to with
                            this string. Blank by default. (Optional)
                            Not used with '--quality=original'.
                            [default: ]

--uuid=UUID                 Specifies the User UUID to use to download the PDF when '--quality=original' is used
                            and --uuid-randomise is not used.
                            Read the 'Notes' section below for details of how to find it. (Optional/Required)
                            Only used with '--quality=original'.
                            [default: None]

--uuid-randomise            Uses a random UUID to download the PDF when '--quality=original' is specified. (Optional)
                            [default: False]

--uuid-hide                 Hides the User UUID watermark on each page of the PDF by making it transparent.
                            This option is overridden by '--uuid-destroy'.
                            Only used with '--quality=original' as watermark not present on lower quality downloads.
                            [default: False]

--uuid-destroy              Completely wipes the User UUID watermark from each page of the PDF. (Experimental)
                            This option overrides by '--uuid-hide'.
                            Only used with '--quality=original' as watermark not present on lower quality downloads.
                            [default: False]

--timestamp-change          Alters the timestamp within the downloaded PDF.
                            Only used with '--quality=original'.
                            [default: False]

--quiet                     Suppress printing of all output except warning and error messages.
                            [default: False]

--debug                     Print extra output to aid debugging of the program.
                            Setting both '--quiet' and '--debug' is contradictory
                            If this happens, a warning is issued and the debug setting overrides the quiet setting.
                            [default: False]

<pdf>                       Save output to this file. (Required)
<url>                       A URL to one image from the magazine. (Required)

Examples:

pocketmagstopdf.py --quality=extrahigh --delay=2 --title="My Magazine, Issue 73, October 2022" my_magazine.pdf https://mcdatastore.blob.core.windows.net/mcmags/<STORAGE_BUCKET_UUID>/<ISSUE_UUID>/extralow/0000.jpg

pocketmagstopdf.py --quality=original --delay=0.5 --uuid-hide --uuid=<USER_UUID> my_magazine.pdf https://mcdatastore.blob.core.windows.net/mcmags/<STORAGE_BUCKET_UUID>/<ISSUE_UUID>/extralow/0000.jpg

Notes:

PLEASE USE THIS SCRIPT RESPONSIBLY. THE MAGAZINE PUBLISHING INDUSTRY RELIES HEAVILY ON INCOME FROM SALES WITH VERY SLIM PROFIT MARGINS.

URLs for pocketmags images and User UUIDs can be found by using the HTML 5 reader and right-clicking on a page and selecting "inspect element". Look for URLs of the form:

https://mcdatastore.blob.core.windows.net/mcmags/<uuid1>/<uuid2>/extralow/<num>.jpg

where <uuid{1,2}> are strings of letters and numbers with dashes separating them and is some 4-digit number.

The User UUID required for downloading the magazine when '--quality=original' can be found by searching the HTML for the text "userGuid:" and copying the hexadecimal value that follows it without the surrounding single quote characters.

Support this Project:

Buy Me a Coffee at ko-fi.com