Xonshiz / comic-dl

Comic-dl is a command line tool to download manga and comics from various comic and manga sites. Supported sites : readcomiconline.to, mangafox.me, comic naver and many more.
https://github.com/Xonshiz/comic-dl
MIT License
545 stars 68 forks source link

Refusing to download from readcomiconline.li #299

Open aaronbkm opened 2 years ago

aaronbkm commented 2 years ago

Error_Log.log I have been using this for many months, without issue, but today, whenever I try to download something from readcomiconline.li it gives an error, Screenshot (4) I have included the error log, and a screenshot of the error

grecusg commented 2 years ago

I was previously getting the same error on a Mac (Monterey 12.2.1), with an identical stack trace. I then updated the Python cloudscraper package from 1.2.34 (iirc) to 1.2.60 and now get a shorter error message:

Fooling CloudFlare...Please Wait... cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 challenge, This feature is not available in the opensource (free) version.

Xonshiz commented 2 years ago

That shorter message tells the issue properly. That's nothing that I can fix/change since it's dependent on another library. Time to look for another alternative then :(

topotech commented 2 years ago

Maybe allowing to inject cookies (particularly a valid "cf_clearance" copied from any browser) may help with this problem. Or is it more complex than that?

Xonshiz commented 2 years ago

That's all dependent on the external library in question. I didn't get time this weekend, but I'm hopeful I'll look at this over the next weekend and probably look for some alternatives for the same. Also, will try passing that cf_clearance token as well, thanks for suggestion @topotech :)

Mycah commented 2 years ago

btw they've added additional security to their downloads, using js obfuscation to the blogspot urls. i've been messing around with it for a few hours. snippet of their new html:

        var lstImages = new Array();

        lstImages.push("VVEZbmlmaWFvZW9uVFlEdVWatRmWVNha1NXUW5FOGdTMDctSUxWdnVMRWJuTUg1dEtmREc3bTBoYzJzeWVDWWpVWXU4dGdncTY1Vm5QZGlFZ3VMZ3Y0N0VabHpjQ215dWFPLWNKcjZjNG5xN1VLc250eEI2dlhOTVBNVjliSFRiSFVoTGxieFoxM0k1QmtPU295UWlDUetR=s0");

        lstImages.push("bVxMUFVXOHFGMzBfbnVxRWzSE5XdVRwSVRKMGxUY0ItdkNER3YwY19ERUJFZEhkR3pESjBvc0x5UVZ4dEZRLTB4MklGa3FLQmktdzFKVjJjNVg2UGZXRFE2WGdaTFpGNDgxeGlNVWtSd0hZU1Q1Tk9IS0pQQlpxQmV0Z1h4MW4tZVJYbGRGaUh5MU8tMHhpc3JRZ0Vk3dkw=s0");

        lstImages.push("b2xMS0lCTVUxMnJSbF9xaWnYwxCX29rY29VOVlfaVgyU1QwRTBlRnlFaVUzZFF2SGxrWHdGUHZ6U2k4anMtUlB1clJuMS13TGdXN2lJN0lGOXc3UFl0cG9oRzE1bHZTd0FZNHRUM192X2ZGbnpjZFVnc252QmRDUzl3YklIWGdyUlVqYmNMdlV1dWZNS3pmVXo5Z291TaM5=s0");

        lstImages.push("R21beWVoTnVKbGdFR2c1REGVydoZ2VlV2VUaGdLZ3ByOGEyem1fUnYzd3NhdndjUFJDUHV1XzB1QmcyclBPT0xsY2U3SUUyUFRYSF9wWGx0UURLMlJMUmlBc2ZRMVViaFZtLTVOcFJlVFRiWEhuUWJrcWpoblduME56U1hUODBQVGtvS29HQlB5NVJDbFhCenRVQVcGmcdo=s0");

        lstImages.push("U21MUUh4ZEJlMHIzU0t1dFHYhN4S2xQY2x2eUVRalluSTlhLXplclVuMzJycHRqR1Mxa0NlTkpmRGxvMnd2ZFZVTFREc0hhb3kxREFvLVBsUE1YX2VJRmtuZzJybkdDeFI2aV9VVm9YeDJ3ajJoMi1Eazk3dkluOGRKamdrVUE4MmhrOERSb0pEalhuU0VJaFRpZ2tTXMN5=s0");

        lstImages.push("SW3dUmNGbGJvdUpUSGl3REUUFhGaWg4UGg4UmNDeU1Oa2tFTW1VM3J6NFNiV0l3UGhjVjJjV0pka3g1R28xU1NKelZMRTdQbWtOcmtBSjJvZWJ0ZEc2RVhzRFNyS1pteEVZVVBIeFFyRlNBZjdBckxNdWFJZmNtZUpfb2hJamhSVzZKbVA1N2RxWmNvcGQtUWI1Z1l2WRBS=s0");

        lstImages.push("anpeVUREYWc3elZiaEJpNGlTFhEQnp6TnpNQndoWHg5NUhFa2xRRUtrck5UV1g3OTZtT1ItRUNTVktvQ0MxVjQ5Vlo4M1pIeUo5ekJiN0xHdzNXOHJwMkZiUU56QUpRY3k0S0EwelRBXzRhZTU0d3VjYWFESFZITnloTm1wQmZMeE12ZGJRSkRFWE5zMlluU3l0QUJU2Q5C=s0");

        lstImages.push("N0DbYzFpYzMtblljNGdDeTlQVRpZ0FaQ0F3UFljWEZuRjRVc2lyeDNqVE1kRWtEZFhYR1VIbHlPcWdFMFNhVDc3NnZDblAtbFNSbDh3aVZvcDFkQmZBQk9IRlgwUndSZG12bVlPVEk3TndpRVJqYlFMcnBQVldTUGpLUTV1eFZ0QjM3dDNRa3dNamx1VEt6Q0h4UUdj2MNQ=s0");
Xonshiz commented 2 years ago

I think this has been the case for quite some time. Have to take a look. Got busy with fixing CI/CD pipeline last weekend. I'm hopeful I'll work on this thing soon... hopefully :(

vetleledaal commented 2 years ago

Not sure about the Cloudflare issues, but obfuscation of the image URLs is a recent addition, they've also changed the algorithm once already. Here's how you descramble the current iteration:

function beau(lstImages) {
    return lstImages.map(url => {
        if (url.startsWith('https')) {
            return url;
        }

        const containsS0 = url.includes('=s0');
        url = url.slice(0, containsS0 ? -3 : -6);
        url = url.slice(4, 22) + url.slice(25);  
        url = url.slice(0, -6) + url.slice(-2);
        url = atob(url);
        url = url.slice(0, 13) + url.slice(17);
        url = url.slice(0, -2) + (containsS0 ? '=s0' : '=s1600');

        return 'https://2.bp.blogspot.com/' + url;
    });
}
Mycah commented 2 years ago

That's pretty amazing. I found the rguard.min.js they use but I have never used js before, so trying to figure it out was a joke. Thanks for this.

Xonshiz commented 2 years ago

This logic of getting those images is there already. Check the readcomicOnlineli file. They've changed some things, in past, it has been like they changed the name of a div classs or div id. I'm hoping it's the same thing again. Let's see If I get time this weekend and I can figure out what they've changed now.. :(

Xonshiz commented 2 years ago

Was looking for alternatives to readcominonline.li and cfscrape/cloudflarescrape and stumbled upon this. Well, it'll take a while to fix this until I introduce the good old selenium again (ugh). :|

Xonshiz commented 2 years ago

Thankfully I didn't have to bring back selenium for now. Had to add a --cookie parameter. If you get a 403 while downloading from readcomiconline.li, then follow these steps:

It should work.

Command will look like this:

comic_dl -i "https://readcomiconline.li/Comic/Transformers-2019" --cookie "ASP.NET_SessionId=gy1ntd1gxnfg2wwt5jh5bnsj; __cf_bm=v3dvgLEYYYQYvJ08mWhbiLW8z_5dBX2mTNYCBk8dVTI-1650105863-0-Acuz/rPJ+lGtW09CBoAoF5ALZSv6H5p8b0MfrNFQkKfgEh3TDfFFX2dGAU6QDzoB9g7cE9pP1lk3Nyr4vqCXB1n/ICkDKgZTtNQzlgaeBNtJpsvv7TjnZ/xdeTCvAX96Tw==; cf_chl_2=4962c67213c52be; cf_chl_prog=x12; cf_clearance=d_xB4I3s7SN1qiScXXOEB4z7_sbL1NSDF_X5RWtQRkY-1650106493-0-150"

I tried adding selenium, but that had some problems with Cloudflare redirects. Not sure what the cause was, will take a look at that alternative in future. The primary reason why I don't want to add Selenium back in this project is because it's heavy, wonky and hard to maintain and setup. This is script is "difficult" enough for people to run and adding another step of setting up selenium isn't something I want to do. If anyone's got a better suggestion, please feel free to raise a PR with the working changes and I'm more than happy to merge it in the main branch.

P.S: Thanks a ton @vetleledaal for finding that method. It saved a lot of time. Appreciate the help and efforts.

Latest binaries should be available in the latest release section now.

aaronbkm commented 2 years ago

I don't know if I am doing something wrong, or what, but, I have downloaded the latest binary, and even when using the --cookie parameter, and passing the cookie value

(looks like this from Chrome "_ga=GA1.2.1772146063.1640340838; rco_quality=hq; fpestid=-noHBOo1SsNkRnaWXOYBEjYORP_dyvfg4VuEfzKXv2C4Q6AEbQRvj_3pF1FAN70Fl_xUmg; list-view=grid; b_token=fW1mHplYLTxwpLDwsSCugCvY3kKUCMvdssRwEADfNrKoHdQEL/lJ1iYqd0Y2OcJHEN3Pfa3jm8GlLZ52DAQ9qoBMCel4yxS5kJ5M1RqTrAp/t1m0J+RoTodjHS1EA+FvokfAgcHJrhVNjZP0FSAgAhxCv5RWztm4xe2zjDwtEwRbQrVpau26cq1dltKR1YOc1x5AKiB1RBc7djfQgL9aog==; _gid=GA1.2.2087406613.1650173529; rco_readType=0; cf_chl_2=69cd54f40c3af75; cf_chl_prog=x22; cf_clearance=a5PT0HyClwbTb5BttA4dxk2ugymv35.w2rq6jwkegZ0-1650263245-0-150; _gat=1; __cf_bm=OFRJAi1yOVoqMSJrmJRq8XGBwyCDfRgsJquWzeG47jI-1650264318-0-AW4sjBGc+2LdcXPidj1O8QOb8RHv5tm7yZY1AnDqXhf/d/wNw3KG0pp4CE28+LwQ0eMREhDPMq3AWdJrcnCNryJf5ecPNcCI3jlsyaVZGPBvHi90CZ9xaYqn6TTrllV5Tw=="

I am still getting the exact same error. I have updated python, cloudscraper, and node.js all to the latest available versions

Xonshiz commented 2 years ago

I haven't tried it again, but if somebody else is facing this same issue, please feel free to update here, we can re-open this one and continue looking into this issue.

DiegoNG90 commented 2 years ago

Hi there, So, I've been trying to download Courtyard by Alan Moore for this couple two days without success.

I'm running Ubuntu on my OS, but I also tried emulating Window Powershell with Wine module. So, my first attemp is this: image

When I open VSCode, it seems that the packages haven't been found image

On a closer look, this error araise: image

I'm not very familiar on Python. I thought that, maybe modules weren't succesfully installed, but I ran the pip install -r requirements.txt command on my Bash:

image

No idea what I'm doing wrong, why it doesn't recognize the main.py "" image

And with wine + --cookie arg : image

Maybe not aiming to the correct cookie I guess?

Any help or insight will much be appreciated. A little lost over here.

isebmo commented 2 years ago

Hi there, Seems it doesn't for me either, with the --cookie parameter. Screenshot Do you what I'm missing @Xonshiz ? Thanks

topotech commented 2 years ago

I'm not totally sure of what is happening, but by monitoring the headers of the http requests made by comic-dl (with the command tcpflow -p -c -i enp1s0f1 port 80) and comparing it with the one generated by my browser, they differed by many parameters. I played with these headers for some time and I realized two things:

  1. Apparently the session is associated to the user agent. Changing the user agent, forces a renewal of cookies. So, unless someone uses Chrome 100.0 for macOS (the browser associated with the user-agent Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.88 Safari/537.36 used by the script ), it will be impossible to convince ReadComicOnline that the cookie belongs to the script.

  2. I didn't test with Chromium, but at least with Firefox for Linux, trying to use the user-agent of Chrome 100 for macOS results in a 503 error. I tested in Chromium for Linux, but I had the same unsatisfactory results.

Given all of this, I see two possible solutions: either add a "user-agent" option to the script, or add a full "header" option.

Note: I know the website uses https instead of http, but since I just wanted to sniff the header generated by the script, it is irrelevant.

himansh-u commented 2 years ago

Alternate solution

1) Open the comic website with an Adblock installed. (Try uBlock Origin) 2) Press Ctrl+P. (Select to print) 3) Adjust the preview size. (May take some time) 4) May also want to select the page count. 5) Click the destination button at the top and select save as PDF.

topotech commented 2 years ago

Alternate solution

1. Open...

If we are talking about solutions which don't use comic-dl, and considering that the website use lazy loading and even derender images outside the screen, probably the best solution is to just mess with the CSS, make the images ultra small, and then use any extension to bulk download the images on the screen.

It's what I do, but it's my last resort since it gets tiresome quite fast. Before that, I try to see if I can find the comic in readcomicsonline.ru and use comic-dl there instead.

But that would derail the conversation, this thread was about the issue #299 after all.

crossflame22 commented 2 years ago

Wondering why this is closed. I'm having this issue. Every time I attempt a download from RCO, specifically from https://readcomiconline.li/Comic/Spider-Man-1990, I get the following. image Why is this happening, and why was the issue closed without solution? This is very frustrating. I did a bit more digging and it appears the program only recognizes the Annual from that series, and doesn't see any images in said annual. If I had to guess, it must be an issue with how the program parses the page. Edit: For the record, I did try the cookie method. Following is what happened then. image That's crazy! It's the exact same thing. You should re-open this issue.

LegalizeAdulthood commented 1 year ago

My issue (#326) was closed as a duplicate of this issue, but this issue is also closed and it still doesn't work properly.... :)

Xonshiz commented 1 year ago

All RCO related issues will be redirected to this one because they're all the same. Also, reiterating why there's an issue with RCO and there is currently no solution for RCO in comic-dl as of now is because I can't find a way to get past their bot checks. I've tried everything I could think of, but it didn't work out.

Read in detail: https://github.com/Xonshiz/comic-dl/issues/299#issuecomment-1100645989

@topotech , thanks for investigating it. But, the user-agent string method is something I tried first and that didn't work out. I even tried running it via selenium, but no luck. I'll probably try to see again if there are any other alternatives in a few weeks. But, have nothing as of now on RCO.

The code is available to everyone, please feel free to experiment and make changes and share with everyone if it works out. I'll be more than thankful :)

Edit: Even though this issue is more or less limiting RCO, I am not going to lock the conversation. Folks, share if you have something I can try to get RCO working again :)

LegalizeAdulthood commented 1 year ago

I understand they all redirect here, but what I'm commenting on is the fact that issue #299 is marked closed, but it is still an open problem...

LegalizeAdulthood commented 1 year ago

Regarding the code, I dug into it myself and found that call to the beau function, which I couldn't seem to find source for; with the source for that posted as above I would have thought we could get a little farther, but alas no go.

I was going to write a scraper in nodejs/cheerio until I found this and thought it would get us farther, but apparently we're stuck.

Now I'm considering an AutoIt script that literally mimics the user actions.

LegalizeAdulthood commented 1 year ago

Maybe an alternate approach is to inject JS into the site that performs the download?

e.g. using this extension: https://chrome.google.com/webstore/detail/custom-javascript-for-web/ddbjnfjiigjmcpcpkmhogomapikjbjdk?hl=en

potatoeggy commented 1 year ago

There's an implementation of beau available in gallery-dl (https://github.com/mikf/gallery-dl/blob/master/gallery_dl/extractor/readcomiconline.py#L132)

…but it's licensed under GPLv2-only, making it incompatible with this project. Maybe you can still find it helpful.

BarneyTheCantankerous commented 1 year ago

There's an implementation of beau available in gallery-dl (https://github.com/mikf/gallery-dl/blob/master/gallery_dl/extractor/readcomiconline.py#L132)

…but it's licensed under GPLv2-only, making it incompatible with this project. Maybe you can still find it helpful.

This worked for me!

However it seems like they use multiple approaches for scrambling the image URLs (I guess it just depends when the page was published or something and whatever approach they were using at the time?):

Xonshiz commented 1 year ago

Nice, thanks for all the inputs everyone. I'll try to go back and take a look at RCO once again with these new suggestions. Now that there are some new methods mentioned, I'm re-opening this issue. Let's hope this gets fixed this time :D

monyarm commented 1 year ago

I can't get the cookies to work

comic_dl --auto --cookie "__cf_bm=AKj7lCG5opzpU5wCAPzl6qC8k_nLadBjEYiv0b.BMqo-1671895508-0-AfUOitS7C1RfHL+26CVMEemlWs3bOVfFPgM9i1PeJWnKTt6lPfC4PgB0pXOxR//YPfwZ//0dOwTtWrWrki8J0kRZ9Y/t9IvuXkUr1q27XhBBTwDVmBEhg/fz2R/mybHqcbDlUk3qdiE6GtfOMyIhIY0="
Traceback (most recent call last):
  File "/home/monyarm/.local/bin/comic_dl", line 8, in <module>
    sys.exit(main())
  File "/home/monyarm/.local/lib/python3.10/site-packages/comic_dl/__main__.py", line 21, in main
    ComicDL(sys.argv[1:])
  File "/home/monyarm/.local/lib/python3.10/site-packages/comic_dl/comic_dl.py", line 214, in __init__
    manual_cookie = data["cookie"]
KeyError: 'cookie'

this is my config

{
    "download_directory": ".",
    "sorting_order": "ascending",
    "conversion": "cbz",
    "keep": "False",
    "image_quality": "Best",
    "comics": {
        "https://readcomiconline.li/Comic/A-Christmas-Carol-A-Ghost-Story": {
            "url": "https://readcomiconline.li/Comic/A-Christmas-Carol-A-Ghost-Story",
            "next": 1,
            "last": "None",
            "username": "None",
            "password": "None",
            "comic_language": "0"
        }
    }
}
cstegmann commented 1 year ago

I can't get the cookies to work

comic_dl --auto --cookie "__cf_bm=AKj7lCG5opzpU5wCAPzl6qC8k_nLadBjEYiv0b.BMqo-1671895508-0-AfUOitS7C1RfHL+26CVMEemlWs3bOVfFPgM9i1PeJWnKTt6lPfC4PgB0pXOxR//YPfwZ//0dOwTtWrWrki8J0kRZ9Y/t9IvuXkUr1q27XhBBTwDVmBEhg/fz2R/mybHqcbDlUk3qdiE6GtfOMyIhIY0="
Traceback (most recent call last):
  File "/home/monyarm/.local/bin/comic_dl", line 8, in <module>
    sys.exit(main())
  File "/home/monyarm/.local/lib/python3.10/site-packages/comic_dl/__main__.py", line 21, in main
    ComicDL(sys.argv[1:])
  File "/home/monyarm/.local/lib/python3.10/site-packages/comic_dl/comic_dl.py", line 214, in __init__
    manual_cookie = data["cookie"]
KeyError: 'cookie'

this is my config

{
    "download_directory": ".",
    "sorting_order": "ascending",
    "conversion": "cbz",
    "keep": "False",
    "image_quality": "Best",
    "comics": {
        "https://readcomiconline.li/Comic/A-Christmas-Carol-A-Ghost-Story": {
            "url": "https://readcomiconline.li/Comic/A-Christmas-Carol-A-Ghost-Story",
            "next": 1,
            "last": "None",
            "username": "None",
            "password": "None",
            "comic_language": "0"
        }
    }
}

I had the same issue, I fixed it by adding a "cookie": "", to my config.json :

{
    "download_directory": "comics",
    "sorting_order": "ascending",
    "conversion": "None",
    "keep": "True",
    "cookie": "",
    "image_quality": "Best",
    "comics": {
        "https://www.webtoons.com/en/action/the-perfect-hybrid/list?title_no=5050": {
            "url": "https://www.webtoons.com/en/action/the-perfect-hybrid/list?title_no=5050",
            "next": 3,
            "last": "None",
            "username": "None",
            "password": "None",
            "comic_language": "0"
        }
    }

@Xonshiz is that working as intended or is the config generator forgetting about this?

divvyupmylife commented 1 year ago

Hey everyone. I'm new here, but I am passionate! I've tried so many suggestions and I still have this issue. 0image(s) [00:00, ?image(s)/s] To be more specific, I am working on Google Chrome and I've integrated @cstegmann 's solution and it's not working for me. Keep in mind I'm working with RCO (apparently dreaded here). Here's my json config. { "download_directory": "comic", "sorting_order": "ascending", "conversion": "cbz", "keep": "True", "cookie": "", "image_quality": "Best", "comics": { "https://readcomiconline.li/Comic/Zootopia-A-Hard-Day-s-Work/Full?id=158629&readType=1": { "url": "https://readcomiconline.li/Comic/Zootopia-A-Hard-Day-s-Work/Full?id=158629&readType=1", "next": 1, "last": "None", "username": "None", "password": "None", "comic_language": "0" } } } And here's my command line: ./cli.py -i https://readcomiconline.li/Comic/Zootopia-A-Hard-Day-s-Work/Full?id=158629 --cookie "__cf_bm=BNL4jy8QnLIWyWWyfBA5T3X3SKNH90u_AtJ.8KBQS4U-1681530575-0-ASg AqKxPM17ZOaWk8NIxlUhMV4FpYo2C/GDS6DOsaVnTRyg4zl0Voek+aqwRYPEfl+voO/YIDAdS00otsxJQKtF9OsVhsg2cg4f22/nLBBw yhSiwmWWxmA2xGseVdpw7BQ==" How can I fix this? Am I finding the wrong kind of information for cookies? I'm using the cookies.txt Chrome extension.

cwelk commented 1 year ago

Same issue (0image(s) [00:00, ?image(s)/s]) on MacOS 13.5:

comic-dl % python3 ./cli.py -i https://readcomiconline.li/Comic/Sha/Issue-1 --cookie "[COOKIE STRING]" Fooling CloudFlare...Please Wait... [] [Comic-dl] Done : Sha [Issue - 1#1] : : 0image(s) [00:00, ?image(s)/s]
Total Time Taken To Complete : 0.8417620658874512

lmagdanello commented 10 months ago

Take a look: #344

DarklySatanic commented 10 months ago

Take a look: #344

Unfortunately, your solution is a bit over my head - Are you able to dumb it down for us?

lmagdanello commented 10 months ago

Take a look: #344

Unfortunately, your solution is a bit over my head - Are you able to dumb it down for us?

Of course.

I did some analysis and noticed that some additional headers were impacting access to the site.

Another point was the regex that performed the search used single quotes and was therefore unable to generate the list of comic book images.

Last but not least, I created a function called beau that decodes the URLs that have been obfuscated and returns the url of the images for download, instead of the old method that used slices of the list of urls.

lmagdanello commented 10 months ago

The beau function was inspired in the comment above 😅

DarklySatanic commented 10 months ago

Cheers so much for that!

I see! Alright, and is this function something that we as users of this file will have access to? Because, after installing the requirements using the Windows Binary, the 0 images issue persists - Please see screenshot.

I've noticed that the folders seem to be downloading, however, just not the files. Please see Screenshot.

Screenshot 2023-11-06 180245

image

These are folders that belong to the comics I have attempted to download (as a test).

I've also been able to successfully download files, but that was back in 2020 and I could have sworn using the Readcomiconline file helped with that.

lmagdanello commented 10 months ago

There is a flow in this case until it reaches the end users.

I submitted the Pull Request with the fixes that I identified in my analysis, but they need to be evaluated and go through an approval cycle by the maintainers of this repository.

Only after a merge (i.e., the fix is ​​approved and incorporated) will you be able to have access.

DarklySatanic commented 10 months ago

That makes sense!

When it is ready, how would we gain access to it? Would we have to download something? I think it was you who kindly assisted me back in 2020, and I remember you helped me out a lot.

DarklySatanic commented 10 months ago

Also, confirming that other websites work fine when it comes to downloading images, which means that:

a) It's a RCO issue b) I'm doing this correctly, lol.

image

tabletseeker commented 8 months ago

Try out the only working downloader for readcomicsonline.li in 2024: https://github.com/tabletseeker/readcomic_dl