TravisHunting / ComicDownloader

Downloads Comics from readcomiconline.li
1 stars 1 forks source link

High Quality downloads? #19

Closed ObiSandwich closed 2 years ago

ObiSandwich commented 2 years ago

TODO: add and option for downloading high or low quality versions of the images

Just noticed there is a High quality version of each comic! It appears the downloads are defaulting at the Low quality setting :( Now I know this, I'll stop any more of my downloads.

I see you already have a To-Do for this... So :) when do you think this To-Do will be looked at? And is there a quick fix that I can use now?

TravisHunting commented 2 years ago

You could try something like this maybe (added the 4th line)

def extractImageUrlFromText(text): urlEnd = text.find("s1600") urlStart = text.find("https") text.replace("1600", "3200") return text[urlStart:urlEnd+5]

On Thu, Jan 6, 2022, 1:39 AM Obi Sandwich @.***> wrote:

TODO: add and option for downloading high or low quality versions of the images

Just noticed there is a High quality version of each comic! It appears the downloads are defaulting at the Low quality setting :( Now I know this, I'll stop any more of my downloads.

I see you already have a To-Do for this... So :) when do you think this To-Do will be looked at? And is there a quick fix that I can use now?

— Reply to this email directly, view it on GitHub https://github.com/TravisHunting/ComicDownloader/issues/19, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOA6SD46V6BLYYMNKBEP233UUQ3XFANCNFSM5LJZ6DHA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

ObiSandwich commented 2 years ago

Thanks... So I added the extra line;

text.replace("1600", "3200")

But couldn't get that to work, it just downloaded the same low res as last time: 1041x1600. I saved a single hi res comic page direct from my browser, and noticed the dimensions were different to your code edit: 1988x3056. So I also tried this as well;

text.replace("1988", "3056")

But no luck with that either :(

Just spotted when the hi res option is selected on the website, the following code is added to the end of the URL - does this help and could it be used any where in your code to achieve the hi res download?

&quality=hq

TravisHunting commented 2 years ago

Updated - Good spot on the &quality=hq bit When that is added to the URL, the javascript that grabs the images from blogspot is updated so that instead of ending with =s1600, it ends with =s0

HQ downloads should work now, I haven't been able to test it yet since I'm supposed to be working right now... lol

ObiSandwich commented 2 years ago

Hey, Golden Rule - Left monitor for work, Right monitor for Github :)

So Downloaded and tested the updated comicScraper.py and it didn't download hi res sadly. Hopefully your test goes better than mine.

TravisHunting commented 2 years ago

I'm wondering if some of the comics don't actually have high res images, or rather the 'high res' version is actually the same as the low res version. which comic are you testing on?

My next step is using selenium to bypass the bot detection, since readcomiconline.li has now identified me as using a bot apparently. I turned the sleep timer off to test something and got snapped

ObiSandwich commented 2 years ago

I guess there's always Tor right.

So far most of the comics I've checked have hi res alternatives, like: https://readcomiconline.li/Comic/Darth-Vader

Been testing on small runs with hi res versions like: https://readcomiconline.li/Comic/Lumberjanes-Gotham-Academy

TravisHunting commented 2 years ago

I figured out how to reset the bot protection, new update will come soon with hq downloads and captcha identification

TravisHunting commented 2 years ago

PR is being reviewed, in the meantime you can use this branch if you're desperate https://github.com/TravisHunting/ComicDownloader/blob/highquality/comicScraper.py

I've added a new flag -l for low quality downloads, high quality downloads are now the default

ObiSandwich commented 2 years ago

BOOM! That branch worked beautifully :) Nice work, thank you.

I did have to jump through some Python hoops to get this running on Linux, so to document this - if any one else gets the following error;

Python-ModuleNotFoundError: No module named 'selenium'

...this is what needs to be installed first on Linux in the following order (tested on LMDE4);

sudo apt install python3-pip

pip3 install selenium

pip3 install webdriver_manager

ObiSandwich commented 2 years ago

Using your High Quality Branch above - After a few successful average sized comic run downloads, a larger comic run gave me a failed download and new error below.

So this is the CAPTCHA thing right? Would this feature be better off triggering users Default Browser rather than just Chrome? As a general rule, I don't use any thing with Google in the title :)

`Sleeping for 10 seconds Captcha Detected Installing chromedriver so that you can solve the captcha ====== WebDriver manager ====== Could not get version for google-chrome with the any command: google-chrome --version || google-chrome-stable --version Current google-chrome version is UNKNOWN Get LATEST chromedriver version for UNKNOWN google-chrome There is no [linux64] chromedriver for browser in cache Trying to download new driver from https://chromedriver.storage.googleapis.com/97.0.4692.71/chromedriver_linux64.zip Driver has been saved in cache [/home/aaa/.wdm/drivers/chromedriver/linux64/97.0.4692.71] Traceback (most recent call last): File "./comicScraper.py", line 270, in main(downloadFull, singleIssue, comicTitle, lowres) File "./comicScraper.py", line 186, in main issueImageLinks = scrapeImageLinksFromIssue(issueLink, lowres) File "./comicScraper.py", line 118, in scrapeImageLinksFromIssue solveCaptcha(url) File "./comicScraper.py", line 55, in solveCaptcha driver = webdriver.Chrome(service=s, options=options) File "/home/aaa/.local/lib/python3.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in init service_log_path, service, keep_alive) File "/home/aaa/.local/lib/python3.7/site-packages/selenium/webdriver/chromium/webdriver.py", line 99, in init options=options) File "/home/aaa/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 268, in init self.start_session(capabilities, browser_profile) File "/home/aaa/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 359, in start_session response = self.execute(Command.NEW_SESSION, parameters) File "/home/aaa/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 424, in execute self.error_handler.check_response(response) File "/home/aaa/.local/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: unknown error: cannot find Chrome binary Stacktrace:

0 0x560357c40a23

1 0x56035770be18

2 0x56035772d14b

3 0x56035772a91a

4 0x56035776574a

5 0x56035775f883

6 0x5603577353fa

7 0x5603577364c5

8 0x560357c7016d

9 0x560357c865bb

10 0x560357c71e75

11 0x560357c86e85

12 0x560357c6586f

13 0x560357ca1ae8

14 0x560357ca1c68

15 0x560357cbcaad

16 0x7f515ca3efa3

`

TravisHunting commented 2 years ago

Hahaha, I see.... you don't trust our google overlords... I've just pushed to 'highquality' branch, you can now choose between firefox and chrome. Give that another try, hopefully it will work. I am on windows but the other collaborator is on linux, once he gets back into it hopefully he can iron out the wrinkles with the linux side if it still doesn't work

TravisHunting commented 2 years ago

And yes that's the captcha bit, unfortunately now that you've triggered it, you will have to successfully kick open a browser via the script in order to solve the captcha, but you can try opening one of the links in your normal browser and see if it gives you the captcha.

ObiSandwich commented 2 years ago

Hey thanks again for the update. So the Firefox captcha kinda worked - but sorry to complicate things, it did something weird on my system...

Once I chose "f" in the script when prompted, my Firefox opened, but in a new/unused/bare Firefox profile! I knew something was up when readcomiconline.li was covered in previously unseen adverts! ...not sure why your script is doing this, but is there a way to only open the default Firefox profile?

Once I was able to close the adverts and solve the captcha the download continued.

TravisHunting commented 2 years ago

So that's actually the intended behavior. It does the same thing for me. The script has its own lightweight version of firefox that you need to access the link through in order to pass the captcha. I'm not sure about for you, but for the other developer and I, we weren't able to successfully access or pass the captcha by launching the user's default browser installation. This was the only way I could get it to work.

I'm really glad to hear that it DID work!!!

TravisHunting commented 2 years ago

The first thing we tried was kicking open the default browser that's installed on the system, but somehow readcomiconline.li was able to tell the difference between that browser and the script, so it wouldn't actually give us the captcha. AKA, we could browse the site normally using our browser, but the script would still be blocked every time.

TravisHunting commented 2 years ago

Pretty ridiculous the number of ads that website has when you don't have an adblocker installed lol

On the plus side, you should only have the pass the captcha once, or maybe very rarely

ObiSandwich commented 2 years ago

No worries, I'm happy it works! You gotta do what you gotta do... I have been wondering about why they have such a huge collection of free comics on their site, but after seeing all those bloody adverts, I now get it! Just glad they don't block adblockers!

Thanks again

AustinHunting commented 2 years ago

Please note, this project has been archived and all future updates will be posted to team-hunting/ComicDownloader

If you feel this issue has not been addressed to your satisfaction please open an issue or pull request in the new repository. As always we appreciate your support.