I-A-C / script.module.lambdascrapers

Scrapers Module for Exodus based add ons
61 stars 39 forks source link

Update scraper #29

Closed host505 closed 5 years ago

host505 commented 5 years ago

.ru uses cloudflare now, but .to not (yet).

Don't merge yet, I'd like to get some feedback, because I'm not sure that the .to mirror is globally accessed. If it's heavily geoblocked, we'll have to utilize cfscrape on .ru instead.

jewbmx commented 5 years ago

No issues for me

jewbmx commented 5 years ago

Not sure if it matters but id remove the .com too while your at it. It directs to the .ru

Just tested the new domain in my addons scraper file and it works fine but brings up 120 results for the new supergirl episode lol i removed both .com and .ru and noticed you even changed the commented url too in your update lol. Dont use vpn or debrid stuff so i didnt test if any of the results played or fail tho.

host505 commented 5 years ago

Yeah that provider always brought too many results. Duplicates or not, no idea.

Dont use vpn or debrid stuff so i didnt test if any of the results played or fail tho

If the scraper fetched results, it means it works. Whether they play or not, it's a matter of the links still being valid or removed by the hoster, or if the respected resolver works on resolveurl (afaik).

SerpentDrago commented 5 years ago

rlsbb.to pulls up here (east coast usa)

Haven't actually tested the file yet though

SerpentDrago commented 5 years ago

Confirmed works fine (kodi 18 , latest released Redux + this scraper fix , Providers cleared) Pulled up perfect good links!

SerpentDrago commented 5 years ago

is there no way to write a scraper so it can detect if there is cloudflare challenge then as needed use tools / modules available to get around it :?

host505 commented 5 years ago

Yes, of course, many scrapers utilize cfscrape.py. But I think scraping is gonna be slower, so as long as the non-cloudflared mirror works, it's the way to go imo.

jewbmx commented 5 years ago

Agreed. Im pretty sure cfscrape takes longer. I swap blocked domains out for clean ones and save the blocked ones as a last resort. If ya think about it the blocks are mainly used because of our kinda impact on their domain so removing them from our scrapers is respectful. cfscrape isnt really needed anyways because almost every website can replaced by another.

jjgrech commented 5 years ago

Changed .ru to .to and removed the .com as per jewbmx suggestion. Pulling perfectly from here in S. Europe. I do expect that to change though, and since this is too good a debrid source to lose, perhaps cfscrape might be needed in the long run. Interesting dilemma.

jewbmx commented 5 years ago

Yea I looked earlier for some more domains and thats the last for that one lol so its gonna need to be fixed pretty soon with the cfscrape. Personally I will probably just throw it in a folder because I dont use it anyways lol

jewbmx commented 5 years ago

Here ya go can someone test this for me lol http://jewrepo.cf/JewRepo/4git/myrlsbb.py

Went ahead and added cfscrape to the old version of this scraper. Gives me results but didnt test much. If it works I suggest sticking this in your pocket til the new version stops working.

jjgrech commented 5 years ago

LOL!! It works mate. A lot.

RLSBB returned 24 1080P links and 54 720P links for a TWD episode.

Yes. I call that a success ;) Thank you sir!

host505 commented 5 years ago

Yes, works. But as I suspected I think it's slightly slower. Results are exactly the same as .to.

jjgrech commented 5 years ago

I don't mind the extra 5 seconds ;) To be honest, didn't even notice it. But I can see how they can build up if this happens to every source scraper.

Still manageable if kept for debrid sources I would contend.

host505 commented 5 years ago

I don't mind the extra 5 seconds

When you scrape 50+ sites, that time is very valuable. Edit: although I don't think that the extra time was 5secs, but lower.

jewbmx commented 5 years ago

Nice. If ya wanna learn to use cfscrape this is a good scraper for it. All i did was add the import at the top,

scraper = cfscrape.create_scraper() Below sources = []

And change all r = client.request(url) To r = scraper.get(url).content

jewbmx commented 5 years ago

The extra time doesnt effect much depending on how your setup. Most people run above 30 so it will still do its job. You will just see it at the end of the search like most dead scrapers when the names show and they slowly complete lol

jjgrech commented 5 years ago

When you scrape 50+ sites, that time is very valuable.

I agree of course. But it's great to have a solution, if one wants it. Much appreciated guys.

Nice. If ya wanna learn to use cfscrape this is a good scraper for it.

Lovely!

jewbmx commented 5 years ago

No problem and thanks for the testing. Also thanks host505 for pointing out this scraper needed fixed lol i tested it the other day and put it in my works folder, would of been a month before i noticed lmao.

Imma remove my version now and use the .to till its time to add the cfscrape to it. When .to goes down itd be wise to add the .com and .ru back too.

JFG90 commented 5 years ago

Thanks for this guys! Yes I believe another source that struggled to give results was 2DLL, usually every time it gives results, but this morning was struggling to get them

jewbmx commented 5 years ago

That should be 2ddl.ws and working proper.

JFG90 commented 5 years ago

Do you mean it should be working properly or it is?

jewbmx commented 5 years ago

Should. Site wise i dont see any reasons for issues. App wise i dont use RD lol

JFG90 commented 5 years ago

It's not behaving like it usually would, struggling to get 2DLL sources currently, had to do 4 or 5 scrapes last night for them to eventually appear, cleared cache, providers etc maybe it's just a temporary thing and will sort itself out.

jewbmx commented 5 years ago

How many scrapers you got enabled? Might be a conflict with your pile

JFG90 commented 5 years ago

I have all debrid sources enabled, never had issues up til last night.

jewbmx commented 5 years ago

Its the scraper for some reason. Dont see any errors but its hella laggy

JFG90 commented 5 years ago

Yer that's what I mean, the scraper eventually gets the links but it's a struggle getting them, hopefully it can be looked at :)

jewbmx commented 5 years ago

Someone could go step by step down the scraper doin a log scrape but i get the feeling that its just the website. Might need to raise your timeout for a day or two and see if that helps till its resolved.

JFG90 commented 5 years ago

Timeout is already on 60 seconds mate, thanks for the help anyway

host505 commented 5 years ago

What does this conversation about another scraper has to do with this pr? Your 'issue' (as you've been told so many times but refuse to accept) is that the site is slow lately. And sure Kodi add-ons scraping it doesn't help, so honestly if we wanted to help them we should kill off the scraper lol. Please keep this on the issues section (although not an issue actually).

JFG90 commented 5 years ago

Never said I wasn't accepting, Was answering someone else's questions, so I apologise, no need for the negativity towards me.

jewbmx commented 5 years ago

Lmao someone woke up on the wrong side of the bed. Ya know this pr is labeled 'update scraper' so i can see how someone might assume this is a decent spot to ask about updating a scraper lol

host505 commented 5 years ago

The scraper name was not mentioned on pr header intentionally. They obviously don't like what we're doing, so didn't want to draw much attention. One can still see which scraper it's related to by just looking at the commit. Anyway, they already opened 2 or 3 issues about their 'issue' (albeit in the wrong place), no need to be mentioned all over the place.

JFG90 commented 5 years ago

Like I've said I apologise didn't want anything to kick off, just want to get along with everyone here. I won't say no more

jewbmx commented 5 years ago

If you read down the little chat this person said at the start a comment about the pr and then added the 2dll issue. Then ya got your random stoned comments by me while i try to take a look at the issue lol. I'll fix the newest issue tho by simply sticking to my own page and letting yall do your thing ;)

host505 commented 5 years ago

I didn't say anything about you for trying to help, why are you making this such a big deal. I just mentioned the obvious that random prs are not places to talk about random issues, doesn't help anyone. Besides as already mentioned he had already opened 2 or 3 issues about it.

jewbmx commented 5 years ago

Lol im just giving ya shit for being a meanie :) but also feel like any chat option here is fair game for help since the idea of the github project is to help people with scraper issues <3

JFG90 commented 5 years ago

Jewbmx thanks for your looking into it for me earlier, Very much appreciated

SerpentDrago commented 5 years ago

hands everyone a beer