anime-dl / anime-downloader

A simple but powerful anime downloader and streamer.
The Unlicense
1.92k stars 218 forks source link

9anime introduced... captcha #183

Closed RaitaroH closed 4 years ago

RaitaroH commented 5 years ago

9anime has introduced google captcha. Thus I updated the script thinking perhaps this was resolved upstream, but not yet. So I am opening this bug.

To reproduce I have 9anime as default. On the other hand animepahe does work, twist.moe gives a link to vlc but is not working, and kissanime... well... I think cloudfare scraper needs an update. Anyway here is the command for 9anime:

▶ anime dl 'fruits basket' --play vlc
anime: anime-downloader 3.6.3
anime: 'NoneType' object has no attribute 'find_all'
LOG ``` ▶ anime dl 'fruits basket' --play vlc --log-level DEBUG INFO root: anime-downloader 3.6.3 DEBUG root: Platform: Linux-5.0.10-050010-generic-x86_64-with-neon-18.04-bionic DEBUG root: Python 3.6.7 DEBUG root: https://www4.9anime.to/search?keyword=fruits+basket ERROR root: 'NoneType' object has no attribute 'find_all' ```
29ayush commented 5 years ago

Can you please post screenshot, where do you experience google captcha. In my workflow, I didn't get any, Did you get it during cloudflare?

RaitaroH commented 5 years ago

Example link here image Link to captcha: here

@29ayush hmmm.... 9anime changes their domain so often I am not even sure I have the "CORRECT" one but ... for instance this command used to work:

anime dl 'https://www4.9anime.to/watch/carole-tuesday.18vq/m1nppx' --play vlc --episodes 4
anime: anime-downloader 3.6.3
anime: Extracting episode info from page
'data-ts'
29ayush commented 5 years ago

Yes I am getting the captcha now, I wasn't getting it earlier.

IngwiePhoenix commented 5 years ago

The Captcha might be a side effect of CloudFlare. According to their instructuions, this is mainly to validate an IP rather than for the website itself. Ergo, if you solve the captcha in a browser and then use the downloader, this should work. But currently #181 is a problem... :)

bylaws commented 5 years ago

Seems to be stored in a cookie when you have completed in browser

SasukeShiro commented 5 years ago

9anime temp ip blocking when sending 'harmful requests'

Azrtheal commented 5 years ago

9anime used to show up, but it's been giving me a Traceback error involving multiple files. Default file configurations with 9anime being the provider. Using latest (as of 5/6/2019) Python and updated anime-downloader script image

29ayush commented 5 years ago

At this moment, 9anime is working for me, I am not getting a captcha. Has 9anime removed the captcha ?

IngwiePhoenix commented 5 years ago

Strange. I can use it right now as well. But as far as I know, the website still runs through CloudFlare. Keep an eye out and let us know if you found anything - I shall do the same, too. :)

RaitaroH commented 5 years ago

I can also confirm 'anime dl' works with 9anime too.

29ayush commented 5 years ago

It looks like 9anime has rolled back captcha. I think this issue should be closed @vn-ki @RaitaroH

RaitaroH commented 5 years ago

@29ayush is back for me. EDIT: wtf is gone again.

vn-ki commented 5 years ago

This stays open and pinned until captcha goes away for real.

em0tionull commented 5 years ago

I'm able to download anime but I can not use the watch command for any provider. Windows 10 Python 3.7 I get this same error.

Zranz commented 5 years ago

Can anybody comment on how to download anime from a supported provider? Kissanime does not work, and neither does 9anime. Any temporal fix?

em0tionull commented 5 years ago

GogoAnime.

Sent from my iPhone

On Jun 2, 2019, at 5:24 PM, Zranz notifications@github.com wrote:

Can anybody comment on how to download anime from a supported provider? Kissanime does not work, and neither does 9anime. Any temporal fix?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Zranz commented 5 years ago

I tried gogoanime, animepahe, kisscartoon, masterani, and none of them currently works. Are they working for anybody, or it depends on the anime?

em0tionull commented 5 years ago

Are you using the watch command or the DL command?

Sent from my iPhone

On Jun 3, 2019, at 5:31 PM, Zranz notifications@github.com wrote:

I tried gogoanime, animepahe, kisscartoon, masterani, and none of them currently works. Are they working for anybody, or it depends on the anime?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Zranz commented 5 years ago

DL commnad.

IngwiePhoenix commented 5 years ago

Welp, looks like they are back. :/

I looked for some reCAPTCHA solvers, and I found this one here: https://www.npmjs.com/package/captcha-solver

Apparently it has a CLI. It would be a thought to use a solving mechanism and extract the cookies out of that. Im still playing around with that CLI, and there might be one for Python itself (I would be surprised if not...)

IngwiePhoenix commented 5 years ago

Also, I dont know if aria2 supports it - but maybe using a cookie jar would be helpful for this situation, as to store the cookies returned from a solved captcha for later use.

Timtam commented 5 years ago

Can confirm that 9anime still doesn't work. the browser however doesn't show any captcha to me.

Timtam commented 5 years ago

Now I got it, but solving it and trying to re-download again doesn't work. A way to pass in cookies to tell it that the captcha was already solved would work wonders here I guess.

IngwiePhoenix commented 5 years ago

I have given this whole situation some thought, and I came up with an idea on how this could easily be tackled:

The downloader tries to connect to 9anime.
IF 9anime shows the captcha page (WAF), recognizable by the inclusion of the reCAPTCHA code,
  THEN open a HTTP server and proxy requests.
             Tell the user the URL to enter into their browser,
             detect any <Set-Cookie:> header in the response,
             and store them in a local cookiejar.
Instruct <aria2c> to use the stored cookies.
Proceed as normal.

aria2c has a few options for this:

       --header=<HEADER>
              Append HEADER to HTTP request header.  You can use this option repeatedly to specify more than one header:

                 $ aria2c --header="X-A: b78" --header="X-B: 9J1" "http://host/file"

       --load-cookies=<FILE>
              Load Cookies from FILE using the Firefox3 format (SQLite3), Chromium/Google Chrome (SQLite3) and the Mozilla/Firefox(1.x/2.x)/Netscape format.

              NOTE:
                 If aria2 is built without libsqlite3, then it doesn't support Firefox3 and Chromium/Google Chrome cookie format.

       --save-cookies=<FILE>
              Save Cookies to FILE in Mozilla/Firefox(1.x/2.x)/ Netscape format. If FILE already exists, it is overwritten. Session Cookies are also saved and their expiry values  are
              treated as 0.  Possible Values: /path/to/file

(Source: $ man aria2c)

So, you could do:

aria2 \
  --header="Cookie: ..." \
  --save-cookies=$anime_dl_config/cookies.dat \
  --load-cookies=$anime_dl_config/cookies.dat
  ...other options...

This should tell aria2c to:

This would mean, that for subsequent runs, aria2c would automatically pick up an established WAF verification and no additional user interaction would be needed.

BUT... You could also let the user supply the cookie jar from their Firefox or Chrome installation - or, just look for them. For instance, for me, I could do: --load-cookies=/Users/Ingwie/Library/Application Support/Google/Chrome/Default/Cookies and it would re-use my existing cookies. In fact, that is exactly what I am going to try out. After all, it is possible to add additional aria2c options.

Anyway, I thought I'd share this idea. Unfortunately, I am not very experienced in Python to actively create a PR, so I am setting back to just...concepting. ^^;

Timtam commented 5 years ago

I'd like a quick solution, just like youtube-dl got. I'd like to make the following assumptions and thus recommend the probably easiest solution:

raycekar commented 5 years ago

Would we be able to using this to get around the captcha?

https://github.com/eastee/rebreakcaptcha

Timtam commented 5 years ago

Did you properly read the blog article, or just checked out the repo?

As the blog article states at the very bottom, this piece of code doesn't work anymore since 3/3/2017.

Nice idea though anyway :).

vn-ki commented 5 years ago

The solutions mentioned here ( @Timtam's one) is possible. I won't be implementing it though. Nineanime breaks their stuff way too frequently. If someone can PR the solution (in a generic way), I will consider it.

raycekar commented 5 years ago

@Timtam I didn't read the updates there,,, my bad :/

gingerbeardman commented 4 years ago

Just to say still not working.

$ anime -ll DEBUG dl yamada
2019-10-18 13:06:17 matt.local anime_downloader.util[18359] INFO anime-downloader 4.0.0
2019-10-18 13:06:17 matt.local anime_downloader.util[18359] DEBUG Platform: Darwin-18.7.0-x86_64-i386-64bit
2019-10-18 13:06:17 matt.local anime_downloader.util[18359] DEBUG Python 3.7.4
2019-10-18 13:06:18 matt.local anime_downloader.sites.helpers.request[18359] DEBUG HTML file temp_dir: /var/folders/sy/jcz9n9dn2t98677rzpmyvkr00000gn/T/animedll34cw94x
2019-10-18 13:06:18 matt.local anime_downloader.sites.helpers.request[18359] DEBUG -----
2019-10-18 13:06:18 matt.local anime_downloader.sites.helpers.request[18359] DEBUG GET https://www4.9anime.to/search?
2019-10-18 13:06:18 matt.local anime_downloader.sites.helpers.request[18359] DEBUG {'params': {'keyword': 'yamada'}}
2019-10-18 13:06:18 matt.local anime_downloader.sites.helpers.request[18359] DEBUG {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) Gecko/20100101 Firefox/56.0'}
2019-10-18 13:06:18 matt.local anime_downloader.sites.helpers.request[18359] DEBUG -----
send: b'GET /search?keyword=yamada HTTP/1.1\r\nHost: www4.9anime.to\r\nuser-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) Gecko/20100101 Firefox/56.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Fri, 18 Oct 2019 12:06:19 GMT
header: Content-Type: text/html
header: Transfer-Encoding: chunked
header: Connection: keep-alive
header: Set-Cookie: __cfduid=d826e6f22bb7787f37cb618148b80586f1571400379; expires=Sat, 17-Oct-20 12:06:19 GMT; path=/; domain=.9anime.to; HttpOnly
header: CF-Cache-Status: DYNAMIC
header: Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
header: Server: cloudflare
header: CF-RAY: 527a62b0d98edbdb-LHR
header: Content-Encoding: gzip
2019-10-18 13:06:19 matt.local anime_downloader.session[18359] DEBUG uncached request
2019-10-18 13:06:19 matt.local anime_downloader.sites.helpers.request[18359] DEBUG https://www4.9anime.to/search?keyword=yamada
2019-10-18 13:06:19 matt.local anime_downloader.sites.helpers.request[18359] DEBUG /var/folders/sy/jcz9n9dn2t98677rzpmyvkr00000gn/T/animedll34cw94x/tmp45u7uh0_
2019-10-18 13:06:19 matt.local anime_downloader.sites.nineanime[18359] DEBUG https://www4.9anime.to/search?keyword=yamada
Traceback (most recent call last):
  File "/usr/local/bin/anime", line 10, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/anime_downloader/cli.py", line 53, in main
    cli()
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/anime_downloader/commands/dl.py", line 85, in command
    anime_url = util.search(anime_url, provider)
  File "/usr/local/lib/python3.7/site-packages/anime_downloader/util.py", line 79, in search
    search_results = cls.search(query)
  File "/usr/local/lib/python3.7/site-packages/anime_downloader/sites/nineanime.py", line 84, in search
    'div', {'class': 'film-list'}).find_all('div', {'class': 'item'})
AttributeError: 'NoneType' object has no attribute 'find_all'
matt@matt:~$ 
voltrare commented 4 years ago

me too

RaitaroH commented 4 years ago

At this point I am not even trying nineanime as a provider anymore so, usually, if I can't find a show easily on a provider I just run this bash code:

  adlwrap() {
    declare -a provider=(animepahe anistream animeflix animefreak gogoanime itsaturday animeflv kissanime kisscartoon twist.moe)
    for k in $provider; do
        printf "\n\033[0;31m%s\n" "PROVIDER: $k"
        anime dl "$1" --episodes "$2" --provider "$k" --play mpv
    done
  }

Edit: I have changed this a bit.

vn-ki commented 4 years ago

Oh, that is interesting. We should maybe make that part of the tool.

GnatNamedAsh commented 4 years ago

Didn't want this getting too stale, so I decided to take a look at some of the code to see if it's possible to fix what is happening with 9anime recently (once past the captcha). Most of the other providers I've tried either seem to have bad quality/intrusive watermarks or also started breaking recently.

The log of the error I was running into and fixed (also happens to be the error above too, but probably different traceback):

LOG ``` Traceback (most recent call last): File "/home/user/.local/bin//anime", line 11, in sys.exit(main()) File "/home/user/.local/lib/python3.6/site-packages/anime_downloader/cli.py", line 53, in main cli() File "/home/user/.local/lib/python3.6/site-packages/click/core.py", line 764, in __call__ return self.main(*args, **kwargs) File "/home/user/.local/lib/python3.6/site-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/home/user/.local/lib/python3.6/site-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/user/.local/lib/python3.6/site-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/user/.local/lib/python3.6/site-packages/click/core.py", line 555, in invoke return callback(*args, **kwargs) File "/home/user/.local/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func return f(get_current_context(), *args, **kwargs) File "/home/user/.local/lib/python3.6/site-packages/anime_downloader/commands/dl.py", line 93, in command fallback_qualities=fallback_qualities) File "/home/user/.local/lib/python3.6/site-packages/anime_downloader/sites/anime.py", line 84, in __init__ self._episode_urls = self.get_data() File "/home/user/.local/lib/python3.6/site-packages/anime_downloader/sites/anime.py", line 165, in get_data self._episode_urls = self._scrape_episodes() File "/home/user/.local/lib/python3.6/site-packages/anime_downloader/sites/nineanime.py", line 125, in _scrape_episodes episodes = episodes.find_all('li') AttributeError: 'NoneType' object has no attribute 'find_all' ```

I've been able to get the dl option for 9anime working for Dragon Ball Z. I've also been able to add it to the list of anime's to watch. I mostly use the tool in Windows through WSL to download the episodes and then broadcast the file directory locally, so I can't "prove" it watches correctly since there's no actual player to load with WSL, so I'll probably fork this repo, add the changes, then test the watch command using the Linux distro I have installed on another PC. I'll also test a few other animes to see if they work properly before making the PR.

UDPATE (25/02/20): As an update, the day of going to test the watch functionality for 9anime (22/02/20), they imposed a pretty strict temporary IP ban/rate limiting rather recently that is affecting 3rd party apps as well as even normal users. Neither the dl or watch function will work and simply the search scrape triggers the rate limiting.

ghost commented 4 years ago

I have been making my own api recently and ran into the same problem that @GnatNamedAsh ran into. It seemed at first it was just rate limiting but I did a bit more digging and found out that the request headers and query string parameters that were being sent to the server were being strictly monitored by cloudflare or some other banning software. If the request headers and query string parameters did not match with what was “supposed” to be going through it would eventually temporarily ban the user after two or three requests. I used the chrome devtools to check the requests and they are listed below. I have removed some sensitive info from the request headers like the cookies and user agent because for obvious reasons and I don't want to be fingerprinted or ip banned permanently.

General: 
Request URL: https://9anime.to/ajax/film/servers/…. 
Request Method: GET
Status Code: 200 
Remote Address: ….
Referrer Policy: no-referrer-when-downgrade

Request Headers: 
:authority: 9anime.to
:method: GET
:path: /ajax/film/servers/..
:scheme: https
accept: application/json, text/javascript, */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;
age: 0
cookie:..
referer: https://9anime.to/watch/...
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: same-origin
user-agent: ...
x-requested-with: XMLHttpRequest

The above request would not cause a temp ban because all the request headers that are needed are present.

General: 
Request URL: https://9anime.to/ajax/film/servers/…. 
Request Method: GET
Status Code: 200 
Remote Address: ….
Referrer Policy: no-referrer-when-downgrade

Request Headers: 
:authority: 9anime.to
:method: GET
:path: /ajax/film/servers/..
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;
cookie:..
referer: https://9anime.to/watch/...
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: same-origin
user-agent: ...

I would get temp banned if the above request was sent to the server more than two times or I would have to complete a captcha. 9anime gives the following response if the user gets temp banned.

“Our firewall temporarily blocks your IP in 1 hour because of detecting harmful requests. Please don't use any unofficial app or browser extension to access to our website to avoid this issue. Thanks!”

Seems like what they mean by “detecting harmful requests” is that the request headers do not match with what 9anime is looking for. All of this means that what @IngwiePhoenix and others mentioned about reCAPTCHA is not needed if the requests are sent properly. 9anime is getting more and more strict with their firewall which means that the requests have to be sent in a curtain order and curtain format so that the user would not get banned or shown a captcha. I will be testing further and report back once I have more results. By the way I'm a nodejs, c, c++, java, and web developer and do not understand a single line of python. This means I do not understand the trace back given by @GnatNamedAsh.

whatevea commented 4 years ago

I have been making my own api recently and ran into the same problem that @GnatNamedAsh ran into. It seemed at first it was just rate limiting but I did a bit more digging and found out that the request headers and query string parameters that were being sent to the server were being strictly monitored by cloudflare or some other banning software. If the request headers and query string parameters did not match with what was “supposed” to be going through it would eventually temporarily ban the user after two or three requests. I used the chrome devtools to check the requests and they are listed below. I have removed some sensitive info from the request headers like the cookies and user agent because for obvious reasons and I don't want to be fingerprinted or ip banned permanently.

General: 
Request URL: https://9anime.to/ajax/film/servers/…. 
Request Method: GET
Status Code: 200 
Remote Address: ….
Referrer Policy: no-referrer-when-downgrade

Request Headers: 
:authority: 9anime.to
:method: GET
:path: /ajax/film/servers/..
:scheme: https
accept: application/json, text/javascript, */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;
age: 0
cookie:..
referer: https://9anime.to/watch/...
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: same-origin
user-agent: ...
x-requested-with: XMLHttpRequest

The above request would not cause a temp ban because all the request headers that are needed are present.

General: 
Request URL: https://9anime.to/ajax/film/servers/…. 
Request Method: GET
Status Code: 200 
Remote Address: ….
Referrer Policy: no-referrer-when-downgrade

Request Headers: 
:authority: 9anime.to
:method: GET
:path: /ajax/film/servers/..
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;
cookie:..
referer: https://9anime.to/watch/...
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: same-origin
user-agent: ...

I would get temp banned if the above request was sent to the server more than two times or I would have to complete a captcha. 9anime gives the following response if the user gets temp banned.

“Our firewall temporarily blocks your IP in 1 hour because of detecting harmful requests. Please don't use any unofficial app or browser extension to access to our website to avoid this issue. Thanks!”

Seems like what they mean by “detecting harmful requests” is that the request headers do not match with what 9anime is looking for. All of this means that what @IngwiePhoenix and others mentioned about reCAPTCHA is not needed if the requests are sent properly. 9anime is getting more and more strict with their firewall which means that the requests have to be sent in a curtain order and curtain format so that the user would not get banned or shown a captcha. I will be testing further and report back once I have more results. By the way I'm a nodejs, c, c++, java, and web developer and do not understand a single line of python. This means I do not understand the trace back given by @GnatNamedAsh.

yeah you are right , its always safe to provide complete headers and mimic a browser rather than sending only the required ones

IguanasInPyjamas commented 4 years ago

9anime is back