a4k-openproject / script.module.openscrapers

OpenScrapers Project
GNU General Public License v3.0
102 stars 40 forks source link

The primewire scraper could benefit from an update #135

Closed SNAPflix closed 4 years ago

SNAPflix commented 4 years ago

Hi! I think it would be wonderful if you could took a look at the primewire scraper and updated it.

Right now it scrapes https://www.primewire.ac/ which is DOWN (522 error) and https://primewire.ink/ which seems to be a clone... If I'm not mistaken..?

Instead please update it to scrape https://www.primewire.li/ or https://www.primewire.ag/ which seems to be updated EVERY day. Thanks in advance!

123Venom commented 4 years ago

First the url we scrape is primewire.ink, not .ac. Just because it's in that domain list does not mean it's the url used to scrape....that's the self.base_link you want to look at. I just scraped Joker and Fantasy Island and both returned links from Primewire.

primewire primewire2

Works for me!

123Venom commented 4 years ago

Make sure the title you are searching does in fact exist on their site. If it does and still nothing returned let me know the title and I'll test it. I had some issues with the new movie Scoob! tonight due to that exclamation point being in some titles while not in the links so it failed our the title check.

SNAPflix commented 4 years ago

1.) "Scoob" is available on primewire.li. Here's a screenshot for that. Search for scoob at 2020-05-20 00-31-26 Please leave the ! mark out of your search, it's not needed to find it. If you search for the word "fantasy".. It will list everything with that word in it as an example of what I mean... Fantasy island included.

2.) Comments aren't showing on primewire.ink (Which also goes down and shows an offline page every now and then) While commenting is working and being used on primewire.li...

A screenshot when searching for scoob on primewire.ink primewire ink 503 error for scoob search  at 2020-05-20 00-25-46

I haven't worked with developing scrapers and don't know exactly how they work.. I just found those links by looking at the files for openscraper module for KODI and compared the results for tv shows in a web browser after that.

If you use a web browser and click on tv shows at the top of page, then primewire.li has a lot more posts compared to primewire.ink

On primewire.ink "Siren" is listed as one of the latest released episodes. Screenshot of page 1 (latest episodes) for primewire.ink New episodes for Tv shows at primwire ink at 2020-05-20 00-13-23

But it won't even list the episodes for the show.. And when clicking on a show, I get redirected to primewire.site instead of primewire.ink This is a screenshot from primewire.site for the show "Siren" which primwire.ink redirects me to... NO links for Siren at primewire site at 2020-05-19 23-57-03

While on primewire.li you will have to go several pages back to find the same tv show because it's updated daily, with LOTS of other shows... This is a screenshot from primewire.li with links to ALL "Siren" episodes... Links for Siren at primewire li 2020-05-19 23-57-47

I still think that primewire.ac is a clone of the original primewire. I used to be a member of the "old" primewire website until it was "hacked". After that I had to find another source to use. But I recently discovered primewire.li which works like the "old" primewire website did before being "hacked".

So I still believe changing openscraper to point at primewire.li would make a difference, otherwise I wouldn't have posted a request here about it...

Also... Another request... I'd like to suggest somebody included http://nyafilmer.lol to the openscraper module...

Thanks for looking into my request(s)

123Venom commented 4 years ago

I didn't say Scoob was not available on primewire. I said, " I had some issues with the new movie Scoob! tonight due to that exclamation point being in some titles while not in the links so it failed our the title check". I was meaning in torrents that I maintain and have now fixed. I was meaning to say if you were searching for that title it could be that same issue I experienced elsewhere.

Thanks for that in depth info. All I did was test the current scraper as written and checked it for the claim it's using .ac. Beyond that I'll be frank, free hosters is not something I invest allot of my time on because I do not use them. I contribute and maintain the torrents and prem hosters for OpenScrapers and that in itself takes ALLOT of my time, and then I try to also maintain the Venom addon. There's 70+ scrapers in all and I simply can not take them all on so imo the low man on the totem pole lost. I'd suggest a prem account and torrents unless another contributing dev wants to chime in and take this one.

Now that I got my honesty out of the way, and sorry if it seemed harsh I'm just not into sugar coating my opinions. I tested dialing in the .li url and no links returned. I then discovered the .li url uses a different search phrase so dialed that in...still nothing. I then peaked to see why and the current, OLD, code is looking for tags that do not exist or were changed. Near as my 5min of looking could tell that scraper would need a complete re-write for the .li support. I'll consider it in the future but I just got done spending nearly 20hrs on the last update so not real eager for a re-write.

SNAPflix commented 4 years ago

All I can say then is THANKS for taking your time to look into this. I'll wait and hope for what the future has to offer regarding this matter... :wink:

123Venom commented 4 years ago

Wish I had better news for you. I did look into this site a bit more. I use Firefox's Inspector to review site code for scrapping. Just looking at one link the Inspector shows me this

<a href="/links/go/E-wMJ" class="propper-link popper" link_version="1" rel="nofollow" target="_blank" onclick="trackOutboundLink('vev.io');" key="E-wMJ">Direct</a>

but if you right click and use view page source we see this.

(<a href="javascript:void(0)" class="propper-link popper" link_version="0" rel="nofollow" target="_blank" onclick="trackOutboundLink('vev.io');">Direct</a>)

Notice the links(href) are hidden by that javascript:void(0). Sadly that page source is what a request returns to us so in this case there's an external javascript that writes that document and makes the links visible. I'm just not familiar enough with using, what I think would be, js2py....which I think could be used to get those links. Whole different ball game than scrapping torrent sites that have no need to hide links, that's the point of a torrent site to make links available to you and scrapping those are simple. You really should at a min test using a premium account as I believe less and less developers have a desire in spending the kind of time scrapping sites like this primewire.li would require....I could be wrong but I inquired to a few other developers on this particular site and the response was not favorable.