anime-dl / anime-downloader

A simple but powerful anime downloader and streamer.
The Unlicense
1.92k stars 220 forks source link

Reintroduce support for 9anime sites #679

Closed ekrekeler closed 2 years ago

ekrekeler commented 3 years ago

Closes #599

For now, same as before, only tested using Streamtape server.

Please suggest changes as needed.

ArjixWasTaken commented 3 years ago

wow, I never thought that someone would actually use selescrape

ArjixWasTaken commented 3 years ago

I tested and it does indeed return a streamtape link its just that streamtape doesnt work as an extractor image

ekrekeler commented 3 years ago

So the streamtape extractor doesn't work sometimes because the server doesn't like the ancient user agent presented by the get helper function. I overrided the user agent in streamtape.py to a more recent one, and haven't seen the 503 error since.

Do you think it makes sense to define a user agent in streamtape.py, or should this be a global setting? Why is the user agent so old anyways? Using python 3.9.5 on WSL2, this is the user agent shown in debug: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2225.0 Safari/537.36

Edit: I debugged this further and there is something wonky going on when the helper makes the request. For the default_headers variable, I can see two defined headers for user-agent.

pp.pprint(default_headers)
{   'User-Agent': "{'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X "
                  '10_7_3) AppleWebKit/535.11 (KHTML, like Gecko) '
                  "Chrome/17.0.963.66 Safari/535.11'}",
    'user-agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3 (KHTML, like '
                  'Gecko) Chrome/19.0.1061.1 Safari/536.3'}

Only one header should be defined for user-agent, but here we have two. And this is using the default configuration.

ArjixWasTaken commented 3 years ago

woah Edit: Yeah, it was because of selescrape (somehow I made this simple mistake) https://github.com/anime-dl/anime-downloader/blob/5d63f2b513bf39229a527ca6749e5854aced610d/anime_downloader/sites/helpers/selescrape.py#L109

it calls https://github.com/anime-dl/anime-downloader/blob/5d63f2b513bf39229a527ca6749e5854aced610d/anime_downloader/const.py#L15-L27

To me defense, this is an issue I had already fixed in an older PR, but it remained not merged https://github.com/anime-dl/anime-downloader/pull/503

ArjixWasTaken commented 3 years ago

wait no, im dumb, you werent talking about that

ekrekeler commented 3 years ago

Yeah, it was because of selescrape

I'm not sure it is, because there is no sel=True in extractors/streamtape.py. So it shouldn't be using selescrape for this request.

wait no, im dumb, you werent talking about that

Yeah I just want to know if I should be defining a different header in streamtape.py. I've had instances where I get HTTP/503 when I removed the duplicate user-agent from default_headers. So this is a separate issue. There are certain user-agents in that random list that the streamtape server doesn't like. So I can either:

  1. Find the user agents that don't work and either update or remove them from the list
  2. Select a user agent that works and use that for every request in streamtape.py

Why I am asking is I don't know is if that list of user agents has a reason for being so out-of-date compared to current user agents, and if changing that list will break (or fix) other sites.

ArjixWasTaken commented 3 years ago

Why I am asking is I don't know is if that list of user agents has a reason for being so out-of-date compared to current user agents, and if changing that list will break (or fix) other sites.

You can go ahead and update that list if you want to. As far as I know that list has not been updated for a long time.

ekrekeler commented 3 years ago

Okay I've updated all the user agents I could find in the code. Tested using anime test and I saw no differences before and after modifying the user agents.

I also tweaked the decodeString method in nineanime.py to remove some extra characters from the URL that shouldn't be there. I have no idea how the encoding for the 9anime API works so I'm just using character matching after the string is decoded instead.

Couple of things to note for next steps:

ekrekeler commented 3 years ago

There is someone else working on an alternate pull request for 9anime, #682 .

It seems that selenium may not be needed to get 9anime working if cloudflare is bypassed. I did not realize this. I will keep this open for the time being, it seems there are still some things that need to be worked out for that pull request. But avoiding the requirement of selenium is preferable.

ArjixWasTaken commented 2 years ago

So uhh... According to @justfoolingaround 9anime is changing its protection on a daily basis. So if this PR is ready I'll merge.

ArjixWasTaken commented 2 years ago

Oh, it doesn't work. They have a "verified" url parameter now...