AlphaSlayer1964 / kemono-dl

A simple kemono.party downloader using python.
504 stars 82 forks source link

Can't use on Google Colab (got different tracebacks with different Python version) #27

Closed shirooo39 closed 2 years ago

shirooo39 commented 2 years ago

Version

Version: 2021.11.03
(Unable to get version with the command python kemono-dl.py --version because your argument parser marked the argument '--cookies' as required)

Service, User ID, Post ID

Your Command

python kemono-dl.py --cookies cookies.txt --links https://kemono.party/patreon/user/7237458

Description of bug

I'm trying to implement this into Google Colab, but the script is giving me python traceback instead.

How To Reproduce

  1. Create a Notebook in Google Colab
  2. Use the command below:
    !wget https://github.com/AplhaSlayer1964/kemono-dl/archive/refs/tags/2021.11.03.zip
    !unzip /content/2021.11.03.zip
    !mv /content/kemono-dl-2021.11.03 /content/kemono-dl
    !pip install -r /content/kemono-dl/requirements.txt
    !python /content/kemono-dl/kemono-dl.py --cookies /content/kemono-dl/cookies.txt --links https://kemono.party/patreon/user/7237458

    (at this point, cookies.txt have been uploaded into the folder)

Error messages and tracebacks

Traceback (most recent call last):
  File "/content/tools/kemono-dl/kemono-dl.py", line 1, in <module>
    from src.main import main
  File "/content/tools/kemono-dl/src/main.py", line 2, in <module>
    from .api import extract_link_info, get_favorites
  File "/content/tools/kemono-dl/src/api.py", line 24, in <module>
    raise_on_status = True
TypeError: __init__() got an unexpected keyword argument 'allowed_methods'

Additional comments

I tried to run it on my main machine (Windows 11 + Python 3.10) and it worked. I could download.
I though that must be because Google Colab is still using Python 3.7.
I tried to install Python 3.10 and run the script with it... I got different traceback this time.

Traceback (most recent call last):
  File "/content/tools/kemono-dl/kemono-dl.py", line 1, in <module>
    from src.main import main
  File "/content/tools/kemono-dl/src/main.py", line 2, in <module>
    from .api import extract_link_info, get_favorites
  File "/content/tools/kemono-dl/src/api.py", line 1, in <module>
    import requests
ModuleNotFoundError: No module named 'requests'
AlphaSlayer1964 commented 2 years ago

Ok so I will have to look into use with google co lab but as for the local version error with 3.10 the requests module should be pre installed if it is not for some reason just do pip install requests though I don't know if it is compatible with 3.10 but I would assume so.

AlphaSlayer1964 commented 2 years ago

Ok just quickly looking the co lab error seems to be a problem with the version of requests and maybe urllib3. I should also mention I have ever tested this with google co lab nor have I ever used google co lab.

!pip install urllib3 --upgrade 
!pip install requests --upgrade 
shirooo39 commented 2 years ago

Hi! thank you for following up my issue.

So, what seem to be the cause of the issue? Your script does work with Python 3.10 on my Windows machine, so it's compatible.

When try to run it on Colab with the pre-installed Python 3.7, I got this traceback "unexpected keyword argument 'allowed_methods'" (which I have already mentioned above). But when I run it with Python 3.10 on Colab, I got different traceback.

If possible, I don't want to have to install 3.10 on Colab just to get the script to run (I assume the script is compatible with at least > 3.5, so I don't really need 3.10)

Here's what I'm gonna do (when I'm on my laptop)

I'll report back the result later.

AlphaSlayer1964 commented 2 years ago

Both of those tracebacks are requests issues. This error is because the requests module isn't installed:

Traceback (most recent call last):
  File "/content/tools/kemono-dl/kemono-dl.py", line 1, in <module>
    from src.main import main
  File "/content/tools/kemono-dl/src/main.py", line 2, in <module>
    from .api import extract_link_info, get_favorites
  File "/content/tools/kemono-dl/src/api.py", line 1, in <module>
    import requests
ModuleNotFoundError: No module named 'requests'

And the colab issue I can't help much with but the solution I found was from googling TypeError: __init__() got an unexpected keyword argument 'allowed_methods' google colab

shirooo39 commented 2 years ago

So upgrading the 'urllib3' and 'request' packages work but I'm getting this now

[info] Downloading User: paperbag
[info] service: [patreon] user_id: [7237458]
[Downloading] User icon.
[Error] Unable to get user icon.
[Downloading] User banner.
[Error] Unable to get user banner.
[info] Downloading Post: June Reward Illustrations Sent!
[info] service: [patreon] user_id: [7237458] post_id: [53118482]
[info] Downloading attachments:
[Downloading]: am_by_minty_s.jpg
[Error] downloading: https://kemono.party/data/4a/3a/4a3a033ee3e0dcdcadfbc37590461438e5e565db154f40bdf264f16e78e7e0df.jpg
403 Client Error: Forbidden for url: https://kemono.party/data/4a/3a/4a3a033ee3e0dcdcadfbc37590461438e5e565db154f40bdf264f16e78e7e0df.jpg
[info] Retrying download in 30 seconds. (1/3)

Could you please give it a try on my Colab Notebook?

With the same command, the script could at least download something on my Windows machine (but the download speed is so slow. not sure why that is...)

AlphaSlayer1964 commented 2 years ago

So the 403 Forbidden means you are getting hit by their DDOS guard and that's why the cookie file is needed. The slow speeds is do to their site not having that file cached, most of the time.

shirooo39 commented 2 years ago

So the 403 Forbidden means you are getting hit by their DDOS guard and that's why the cookie file is needed.

but I already uploaded my cookies file. the script wouldn't work without it anyway, since argparser marked --cookies as required, and I'm using the same cookies file on my Windows machine...

AlphaSlayer1964 commented 2 years ago

Google's co lab ip address might be blocked then? That 403 error only happens if you've been blocked.

AlphaSlayer1964 commented 2 years ago

So the api calls don't have a DDOS gaurd on them but the actual files do. My best guess is that google's co lab ip's are automatically blocked by DDOS guard.

shirooo39 commented 2 years ago

I don't think kemono blocked Colab's IP though and I don't see any reason why would they do that.

I tried gallery-dl and it worked. the download is also quite fast. Using the exact same cookies.txt file, so cookies is not the issue. image

So I think it's your script that can't be run on Google Colab.

AlphaSlayer1964 commented 2 years ago

no idea why because gallery-dl uses the same requests library as my script.

shirooo39 commented 2 years ago

Welp, looks like we hit a roadblock then...

It would be nice that your script could work on Google Colab though.
Technically, it worked, but it can't download anything other than 403.

Thank you for keep following up with my issue, but it looks like I'll be using gallery-dl instead. I'll be closing this issue.

AlphaSlayer1964 commented 2 years ago

were you using a cookie file from firefox?

shirooo39 commented 2 years ago

were you using a cookie file from firefox?

Yes, I am.

I was so stupid that I didn't read this before. But the thing is, I didn't change anything within the file itself and it worked on my Windows machine.
When I use the same file on Colab, I just got a 403 instead.

If I needed the remove the Httponly part, I should be getting 403 as well on my Windows machine, but I didn't get 403.

shirooo39 commented 2 years ago

So I decided to give it yet another try, but I got different traceback this time

Traceback (most recent call last):
  File "/content/tools/kemono-dl/kemono-dl.py", line 4, in <module>
    main()
  File "/content/tools/kemono-dl/src/main.py", line 15, in main
    if not extract_link_info(link):
  File "/content/tools/kemono-dl/src/api.py", line 284, in extract_link_info
    info['username'] = get_username(info)
  File "/content/tools/kemono-dl/src/api.py", line 273, in get_username
    for creator in response.json():
  File "/usr/local/lib/python3.7/dist-packages/requests/models.py", line 910, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 2321512 (char 2321511)

I've done what mentioned in here but it's still not working

AlphaSlayer1964 commented 2 years ago

Strange that the firefox cookies work on your local machine without doing anything to it. Someone else made a post about the JSON decoder error. I have no Idea what causes it and I can't reproduce it sadly. I think it might happen if your machine can't load the entre json into memory, it's really big. But I would have guessed that google co lab would be able to handle it. Might have to make it download the all creators api and read it in chunks or something.

AlphaSlayer1964 commented 2 years ago

I just released an update that retries if it gets that json decoder error. It seems to be a server side issue where you get a bad response back.