Closed ObiSandwich closed 2 years ago
You could try something like this maybe (added the 4th line)
def extractImageUrlFromText(text): urlEnd = text.find("s1600") urlStart = text.find("https") text.replace("1600", "3200") return text[urlStart:urlEnd+5]
On Thu, Jan 6, 2022, 1:39 AM Obi Sandwich @.***> wrote:
TODO: add and option for downloading high or low quality versions of the images
Just noticed there is a High quality version of each comic! It appears the downloads are defaulting at the Low quality setting :( Now I know this, I'll stop any more of my downloads.
I see you already have a To-Do for this... So :) when do you think this To-Do will be looked at? And is there a quick fix that I can use now?
— Reply to this email directly, view it on GitHub https://github.com/TravisHunting/ComicDownloader/issues/19, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOA6SD46V6BLYYMNKBEP233UUQ3XFANCNFSM5LJZ6DHA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thanks... So I added the extra line;
text.replace("1600", "3200")
But couldn't get that to work, it just downloaded the same low res as last time: 1041x1600. I saved a single hi res comic page direct from my browser, and noticed the dimensions were different to your code edit: 1988x3056. So I also tried this as well;
text.replace("1988", "3056")
But no luck with that either :(
Just spotted when the hi res option is selected on the website, the following code is added to the end of the URL - does this help and could it be used any where in your code to achieve the hi res download?
&quality=hq
Updated - Good spot on the &quality=hq bit When that is added to the URL, the javascript that grabs the images from blogspot is updated so that instead of ending with =s1600, it ends with =s0
HQ downloads should work now, I haven't been able to test it yet since I'm supposed to be working right now... lol
Hey, Golden Rule - Left monitor for work, Right monitor for Github :)
So Downloaded and tested the updated comicScraper.py and it didn't download hi res sadly. Hopefully your test goes better than mine.
I'm wondering if some of the comics don't actually have high res images, or rather the 'high res' version is actually the same as the low res version. which comic are you testing on?
My next step is using selenium to bypass the bot detection, since readcomiconline.li has now identified me as using a bot apparently. I turned the sleep timer off to test something and got snapped
I guess there's always Tor right.
So far most of the comics I've checked have hi res alternatives, like: https://readcomiconline.li/Comic/Darth-Vader
Been testing on small runs with hi res versions like: https://readcomiconline.li/Comic/Lumberjanes-Gotham-Academy
I figured out how to reset the bot protection, new update will come soon with hq downloads and captcha identification
PR is being reviewed, in the meantime you can use this branch if you're desperate https://github.com/TravisHunting/ComicDownloader/blob/highquality/comicScraper.py
I've added a new flag -l for low quality downloads, high quality downloads are now the default
BOOM! That branch worked beautifully :) Nice work, thank you.
I did have to jump through some Python hoops to get this running on Linux, so to document this - if any one else gets the following error;
Python-ModuleNotFoundError: No module named 'selenium'
...this is what needs to be installed first on Linux in the following order (tested on LMDE4);
sudo apt install python3-pip
pip3 install selenium
pip3 install webdriver_manager
Using your High Quality Branch above - After a few successful average sized comic run downloads, a larger comic run gave me a failed download and new error below.
So this is the CAPTCHA thing right? Would this feature be better off triggering users Default Browser rather than just Chrome? As a general rule, I don't use any thing with Google in the title :)
`Sleeping for 10 seconds
Captcha Detected
Installing chromedriver so that you can solve the captcha
====== WebDriver manager ======
Could not get version for google-chrome with the any command: google-chrome --version || google-chrome-stable --version
Current google-chrome version is UNKNOWN
Get LATEST chromedriver version for UNKNOWN google-chrome
There is no [linux64] chromedriver for browser in cache
Trying to download new driver from https://chromedriver.storage.googleapis.com/97.0.4692.71/chromedriver_linux64.zip
Driver has been saved in cache [/home/aaa/.wdm/drivers/chromedriver/linux64/97.0.4692.71]
Traceback (most recent call last):
File "./comicScraper.py", line 270, in
`
Hahaha, I see.... you don't trust our google overlords... I've just pushed to 'highquality' branch, you can now choose between firefox and chrome. Give that another try, hopefully it will work. I am on windows but the other collaborator is on linux, once he gets back into it hopefully he can iron out the wrinkles with the linux side if it still doesn't work
And yes that's the captcha bit, unfortunately now that you've triggered it, you will have to successfully kick open a browser via the script in order to solve the captcha, but you can try opening one of the links in your normal browser and see if it gives you the captcha.
Hey thanks again for the update. So the Firefox captcha kinda worked - but sorry to complicate things, it did something weird on my system...
Once I chose "f" in the script when prompted, my Firefox opened, but in a new/unused/bare Firefox profile! I knew something was up when readcomiconline.li was covered in previously unseen adverts! ...not sure why your script is doing this, but is there a way to only open the default Firefox profile?
Once I was able to close the adverts and solve the captcha the download continued.
So that's actually the intended behavior. It does the same thing for me. The script has its own lightweight version of firefox that you need to access the link through in order to pass the captcha. I'm not sure about for you, but for the other developer and I, we weren't able to successfully access or pass the captcha by launching the user's default browser installation. This was the only way I could get it to work.
I'm really glad to hear that it DID work!!!
The first thing we tried was kicking open the default browser that's installed on the system, but somehow readcomiconline.li was able to tell the difference between that browser and the script, so it wouldn't actually give us the captcha. AKA, we could browse the site normally using our browser, but the script would still be blocked every time.
Pretty ridiculous the number of ads that website has when you don't have an adblocker installed lol
On the plus side, you should only have the pass the captcha once, or maybe very rarely
No worries, I'm happy it works! You gotta do what you gotta do... I have been wondering about why they have such a huge collection of free comics on their site, but after seeing all those bloody adverts, I now get it! Just glad they don't block adblockers!
Thanks again
Please note, this project has been archived and all future updates will be posted to team-hunting/ComicDownloader
If you feel this issue has not been addressed to your satisfaction please open an issue or pull request in the new repository. As always we appreciate your support.
Just noticed there is a High quality version of each comic! It appears the downloads are defaulting at the Low quality setting :( Now I know this, I'll stop any more of my downloads.
I see you already have a To-Do for this... So :) when do you think this To-Do will be looked at? And is there a quick fix that I can use now?