FireHead90544 / SenPY

SenPY - A highly efficient CLI based anime downloader written in python and integrated with aria2.
GNU General Public License v3.0
18 stars 4 forks source link

Automate URL and fix Aria2 command #14

Closed Arctic4161 closed 2 weeks ago

Arctic4161 commented 3 weeks ago

The main URL and AJAX URL have been automated and shouldn't need to be manually set for the foreseeable future. Aria2 command had a double input for max concurrent downloads and was hardcoded for 16, this has been changed to use the config file input. Other unused or unoptimized commands have been removed.

Arctic4161 commented 3 weeks ago

Please do these changes

  • Adding the -x 16 -j 16 -s 16 flags back. (Also, any significant optimizations you seen upon removing these?)
  • Update the version in __init__.py
  • Tag the version so that it triggers the build action (Example, git tag v1.2.1)

Hey @FireHead90544 I am unable to tag a pull request, but I did update the Init and publish a version under my branch with the version tag.

Arctic4161 commented 3 weeks ago

I think offloading all the configuration stuff including the ajax domain to config.py would be good, but for now it's okay being here.

I think this is a good idea as well. I didn't want to add in imports into config and have to request the main URL twice when it was done for the episodes in client. It was already there so I utilized it. The .info site doesn't contain the information we need to pass to ajax unfortunately. It would make it cleaner to move it to config.

FireHead90544 commented 3 weeks ago

Please do these changes

  • Adding the -x 16 -j 16 -s 16 flags back. (Also, any significant optimizations you seen upon removing these?)
  • Update the version in __init__.py
  • Tag the version so that it triggers the build action (Example, git tag v1.2.1)

Hey @FireHead90544 I am unable to tag a pull request, but I did update the Init and publish a version under my branch with the version tag.

Yeah, my bad PRs can't be tagged, I'll tag and push the tag to upstream myself.

FireHead90544 commented 3 weeks ago

@Arctic4161 I just pulled your changes to a local branch and tested them out. It's leading to ConnectionReset Error. I am able to access gogotaku.info from a browser but the library seems to be not able to connect to it. I just tried out using proxies, faking headers but still the same. Maybe it's getting blocked by the ISP, but again how am I able to access it from browser if that's the case (I'm not even using VPN). I tried accessing it using the library from a remote cloud server and it worked there. So, I think it won't work for "everyone". I suggest writing a github action to scrape gogotaku.info and write the MAIN_URL to a text file in the repository (for every let's say x interval cronjob), and to fetch the main url from our client, we can just GET that file and read it. So, I think there's some changes needed. Anything else you'd suggest? Or shall we go with this approach only?

(Debugger)

image

Arctic4161 commented 3 weeks ago

@FireHead90544 very weird, typically that's due to sending malformed data. Try and run it outside of the IDE. It should be as any other site we've used before. There's nothing special about the .info site other than it hardly ever changes

Arctic4161 commented 3 weeks ago

@FireHead90544 if that doesn't work we can try and use request. I can create a section using request later on my branch that you can test on your end

FireHead90544 commented 3 weeks ago

@FireHead90544 very weird, typically that's due to sending malformed data. Try and run it outside of the IDE. It should be as any other site we've used before. There's nothing special about the .info site other than it hardly ever changes

No, I think it's not due to sending malformed data, I've tried the same on several different environments, above one is just a test using a visual debugger. I've tried everything as simple as executing requests.get('https://gogotaku.info/') and it seems that (as I've been reading several articles related to it) the library is not reopening the connection to the remote server if it once fails.

FireHead90544 commented 3 weeks ago

@FireHead90544 if that doesn't work we can try and use request. I can create a section using request later on my branch that you can test on your end

It's alright, I just wrote a github action to make things easy for us. It'll now run every sunday at midnight and fetch the current url and write it to SenPY/CURRENT_URL.txt (<--this is link to raw file). Could you please fetch this and read the domain from it. It'll work as expected.

And I've also made so that we can also run this workflow action manually so that in case if the url changes and the workflow couldn't run due to the day not being sunday. Could also make it so that it runs everyday, but again since it changes after a long time, it's better to not waste Github Action hours haha.

FireHead90544 commented 3 weeks ago

*Make sure to fetch the upstream and pull the changes, as there'd be a few commits (no code related, just added action workflow)

Arctic4161 commented 3 weeks ago

@FireHead90544 awesome, I've never used github actions. I'm still a noob with github haha. I'll pull down when I get some free time. I think this works as a work around for now but I'd like to figure out why it's failing for you but works for me.

FireHead90544 commented 3 weeks ago

@FireHead90544 awesome, I've never used github actions. I'm still a noob with github haha. I'll pull down when I get some free time. I think this works as a work around for now but I'd like to figure out why it's failing for you but works for me.

Sure buddy, take your time. I think it's either related to ISP (IP Blocking) or the issue with the library itself.

Arctic4161 commented 3 weeks ago

@FireHead90544 awesome, I've never used github actions. I'm still a noob with github haha. I'll pull down when I get some free time. I think this works as a work around for now but I'd like to figure out why it's failing for you but works for me.

Sure buddy, take your time. I think it's either related to ISP (IP Blocking) or the issue with the library itself.

Can you try and run it through a VPN and see if it is the ISP? If it is there's nothing we can really do about that and so I think your solution will have to be a permanent one.

Arctic4161 commented 3 weeks ago

@FireHead90544 awesome, I've never used github actions. I'm still a noob with github haha. I'll pull down when I get some free time. I think this works as a work around for now but I'd like to figure out why it's failing for you but works for me.

Sure buddy, take your time. I think it's either related to ISP (IP Blocking) or the issue with the library itself.

Just spent a little time reading about it and it could be your ISP closing it due to too many requests too frequently. When you get time, and if you feel up to it, would you add in a sleep timer before the request something like 2 sec. Then add in a timeout parameter for the request. I'd say 10 sec should work.

FireHead90544 commented 3 weeks ago

@FireHead90544 awesome, I've never used github actions. I'm still a noob with github haha. I'll pull down when I get some free time. I think this works as a work around for now but I'd like to figure out why it's failing for you but works for me.

Sure buddy, take your time. I think it's either related to ISP (IP Blocking) or the issue with the library itself.

Just spent a little time reading about it and it could be your ISP closing it due to too many requests too frequently. When you get time, and if you feel up to it, would you add in a sleep timer before the request something like 2 sec. Then add in a timeout parameter for the request. I'd say 10 sec should work.

I tried it, didn't work. still the same. Seems like the remote is forcibly closing the connection.

FireHead90544 commented 3 weeks ago

@FireHead90544 awesome, I've never used github actions. I'm still a noob with github haha. I'll pull down when I get some free time. I think this works as a work around for now but I'd like to figure out why it's failing for you but works for me.

Sure buddy, take your time. I think it's either related to ISP (IP Blocking) or the issue with the library itself.

Can you try and run it through a VPN and see if it is the ISP? If it is there's nothing we can really do about that and so I think your solution will have to be a permanent one.

I tried it as well, seems like ISP was the culprit, but again idk why it works in browser tho.

Arctic4161 commented 3 weeks ago

@FireHead90544 awesome, I've never used github actions. I'm still a noob with github haha. I'll pull down when I get some free time. I think this works as a work around for now but I'd like to figure out why it's failing for you but works for me.

Sure buddy, take your time. I think it's either related to ISP (IP Blocking) or the issue with the library itself.

Can you try and run it through a VPN and see if it is the ISP? If it is there's nothing we can really do about that and so I think your solution will have to be a permanent one.

I tried it as well, seems like ISP was the culprit, but again idk why it works in browser tho.

Last thing before I stop bugging you. Can you try and use request headers?

url = 'your-url-here'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36'}
r = requests.get(url, headers=headers)
FireHead90544 commented 3 weeks ago

@FireHead90544 awesome, I've never used github actions. I'm still a noob with github haha. I'll pull down when I get some free time. I think this works as a work around for now but I'd like to figure out why it's failing for you but works for me.

Sure buddy, take your time. I think it's either related to ISP (IP Blocking) or the issue with the library itself.

Can you try and run it through a VPN and see if it is the ISP? If it is there's nothing we can really do about that and so I think your solution will have to be a permanent one.

I tried it as well, seems like ISP was the culprit, but again idk why it works in browser tho.

Last thing before I stop bugging you. Can you try and use request headers?

url = 'your-url-here'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36'}
r = requests.get(url, headers=headers)

It's alright buddy xD, I have already tried it as mentioned earlier. Nothin works :(

Arctic4161 commented 2 weeks ago

@FireHead90544 awesome, I've never used github actions. I'm still a noob with github haha. I'll pull down when I get some free time. I think this works as a work around for now but I'd like to figure out why it's failing for you but works for me.

Sure buddy, take your time. I think it's either related to ISP (IP Blocking) or the issue with the library itself.

Can you try and run it through a VPN and see if it is the ISP? If it is there's nothing we can really do about that and so I think your solution will have to be a permanent one.

I tried it as well, seems like ISP was the culprit, but again idk why it works in browser tho.

Last thing before I stop bugging you. Can you try and use request headers?

url = 'your-url-here'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36'}
r = requests.get(url, headers=headers)

It's alright buddy xD, I have already tried it as mentioned earlier. Nothin works :(

Been thinking about the best way to go about this and still have it automated. I think if we use what I have written but catch that exception and then use a hardcoded URL and allow redirects I think we can still keep it semi automated.

What do you think?

FireHead90544 commented 2 weeks ago

@Arctic4161 I think that works, but what'd be the hardcoded url? While it'll be satisfying our curiosity to code a fix for that problem. But I think going with "If it works, don't touch it" route will be best xD. So I personally think, let the GitHub action do the scraping part for current url, and in our client side, we can just read the CURRENT_URL.txt as it'll be always updated.

Arctic4161 commented 2 weeks ago

@Arctic4161 I think that works, but what'd be the hardcoded url? While it'll be satisfying our curiosity to code a fix for that problem. But I think going with "If it works, don't touch it" route will be best xD. So I personally think, let the GitHub action do the scraping part for current url, and in our client side, we can just read the CURRENT_URL.txt as it'll be always updated.

If the txt is updated on git what's the best way to scrape that? That's a new one for me tbh

FireHead90544 commented 2 weeks ago

@Arctic4161 I think that works, but what'd be the hardcoded url? While it'll be satisfying our curiosity to code a fix for that problem. But I think going with "If it works, don't touch it" route will be best xD. So I personally think, let the GitHub action do the scraping part for current url, and in our client side, we can just read the CURRENT_URL.txt as it'll be always updated.

If the txt is updated on git what's the best way to scrape that? That's a new one for me tbh

Since this CURRENT_URL.txt would always remain updated, It's as easy as doing

CURRENT_URL = requests.get("https://raw.githubusercontent.com/FireHead90544/SenPY/main/CURRENT_URL.txt").text

Now, CURRENT_URL variable would be pointing to https://anitaku.so/

Arctic4161 commented 2 weeks ago

@Arctic4161 I think that works, but what'd be the hardcoded url? While it'll be satisfying our curiosity to code a fix for that problem. But I think going with "If it works, don't touch it" route will be best xD. So I personally think, let the GitHub action do the scraping part for current url, and in our client side, we can just read the CURRENT_URL.txt as it'll be always updated.

If the txt is updated on git what's the best way to scrape that? That's a new one for me tbh

Since this CURRENT_URL.txt would always remain updated, It's as easy as doing

CURRENT_URL = requests.get("https://raw.githubusercontent.com/FireHead90544/SenPY/main/CURRENT_URL.txt").text

Now, CURRENT_URL variable would be pointing to https://anitaku.so/

Awesome, that sounds good to me. Can you merge this PR when you get a chance and update it for your .txt? I'll pull down when you do and start working on the batch download for my branch to start forming an idea.

FireHead90544 commented 2 weeks ago

@Arctic4161 I think that works, but what'd be the hardcoded url? While it'll be satisfying our curiosity to code a fix for that problem. But I think going with "If it works, don't touch it" route will be best xD. So I personally think, let the GitHub action do the scraping part for current url, and in our client side, we can just read the CURRENT_URL.txt as it'll be always updated.

If the txt is updated on git what's the best way to scrape that? That's a new one for me tbh

Since this CURRENT_URL.txt would always remain updated, It's as easy as doing

CURRENT_URL = requests.get("https://raw.githubusercontent.com/FireHead90544/SenPY/main/CURRENT_URL.txt").text

Now, CURRENT_URL variable would be pointing to https://anitaku.so/

Awesome, that sounds good to me. Can you merge this PR when you get a chance and update it for your .txt? I'll pull down when you do and start working on the batch download for my branch to start forming an idea.

Suree, I'll do that as soon as I get bit free.

FireHead90544 commented 2 weeks ago

Tried my fix, it works. Merging this and pushing my new changes, fetch them in yours'

FireHead90544 commented 2 weeks ago

@Arctic4161 merged and created the release, pull these changes before you start anything, added you as a contributor too :)