I-A-C / script.module.lambdascrapers

Scrapers Module for Exodus based add ons
61 stars 39 forks source link

2 updated scrapers #49

Closed sergio49650 closed 5 years ago

sergio49650 commented 5 years ago

Hi, I updated RLSBB and 2DDL with request options provided by thier websites. RLSBB : I tried cfscraper but can be long , rlsbb search engin results are useless because of a kind of empty dynamic datatable. So we can't grab any url. So I use the .to site directly with more search options to get more results. 2DDL : same thing I'm sure we can do better but it works fine. juste get a try on different tvshows I htink I will try others like directdl but I can"t get a user login, anyway :) thanks

jewbmx commented 5 years ago

I just tested your rlsbb and respectfully dont like it lol. Brings way too many results. Normal hunter killer movie results are around 12 and yours draws 35. Then for tv shows tried the flash and got 135

jewbmx commented 5 years ago

Think your 2ddl might be a good change, brings up less results compared to what the normal 2ddl shows. For movies its i like it but its tv show section isnt working properly, normal 2ddl brings results for alot of shows your 2ddl fails on

Might wanna make it movies only and change the normal 2ddl to tv shows only that way you dont miss out on anything and dont have a dupe issue

sergio49650 commented 5 years ago

Hi, may be because rlsbb has a lloooooot of links (when you check from web site, you may be afraid, but more are SD)This is not from my changes, was allreafy like that, all I chnaged is the base query to get links.Different way to reduce this :- Pre-emptive parameter in settings (you can limit to 40 for example)- limitting number of results directly in the addon (one limit per definition, 30 SD, 20 720...)- adding an option in settings to choose lowest definition, as we have upper difinition choice (4k, 1080...), adding an option for lower definition (SD, 720...)???-------- Message original --------Objet : Re: [I-A-C/script.module.lambdascrapers] 2 updated scrapers (#49)De : jewbmx À : "I-A-C/script.module.lambdascrapers " Cc : sergio49650 ,Author I just tested your rlsbb and respectfully dont like it lol. Brings way too many results. Normal hunter killer movie results are around 12 and yours draws 35. Then for tv shows tried the flash and got 135

—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/I-A-C/script.module.lambdascrapers","title":"I-A-C/script.module.lambdascrapers","subtitle":"GitHub repository","main_image_url":"https://assets-cdn.github.com/images/email/message_cards/header.png","avatar_image_url":"https://assets-cdn.github.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/I-A-C/script.module.lambdascrapers"}},"updates":{"snippets":[{"icon":"PERSON","message":"@jewbmx in #49: I just tested your rlsbb and respectfully dont like it lol. Brings way too many results. Normal hunter killer movie results are around 12 and yours draws 35. Then for tv shows tried the flash and got 135"}],"action":{"name":"View Pull Request","url":"https://github.com/I-A-C/script.module.lambdascrapers/pull/49#issuecomment-440926700"}}} [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/I-A-C/script.module.lambdascrapers/pull/49#issuecomment-440926700", "url": "https://github.com/I-A-C/script.module.lambdascrapers/pull/49#issuecomment-440926700", "name": "View Pull Request" }, "description": "View this Pull Request on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } }, { "@type": "MessageCard", "@context": "http://schema.org/extensions", "hideOriginalBody": "false", "originator": "AF6C5A86-E920-430C-9C59-A73278B5EFEB", "title": "Re: [I-A-C/script.module.lambdascrapers] 2 updated scrapers (#49)", "sections": [ { "text": "", "activityTitle": "jewbmx", "activityImage": "https://assets-cdn.github.com/images/email/message_cards/avatar.png", "activitySubtitle": "@jewbmx", "facts": [

] } ], "potentialAction": [ { "name": "Add a comment", "@type": "ActionCard", "inputs": [ { "isMultiLine": true, "@type": "TextInput", "id": "IssueComment", "isRequired": false } ], "actions": [ { "name": "Comment", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueComment\",\n\"repositoryFullName\": \"I-A-C/script.module.lambdascrapers\",\n\"issueId\": 49,\n\"IssueComment\": \"{{IssueComment.value}}\"\n}" } ] }, { "name": "Close pull request", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"PullRequestClose\",\n\"repositoryFullName\": \"I-A-C/script.module.lambdascrapers\",\n\"pullRequestId\": 49\n}" }, { "targets": [ { "os": "default", "uri": "https://github.com/I-A-C/script.module.lambdascrapers/pull/49#issuecomment-440926700" } ], "@type": "OpenUri", "name": "View on GitHub" }, { "name": "Unsubscribe", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"MuteNotification\",\n\"threadId\": 414825834\n}" } ], "themeColor": "26292E" } ]

jewbmx commented 5 years ago

Ewww your email app just puked on my screen lol.

I checked out the website and seen it wasnt some kinda dupe thing, looks like the normal scraper doesnt load 'older posts' and your updated scraper does. I just prefer my scrapers to be around 10 results max since theres usually 20-40 scrapers for each search.

Also think i do have all those settings in my addon but i test bareback and wide open to get real full results.

Since you use RD do you feel like going thru the RD scrapers and doin a full scraper check on em? I did all the normal en scrapers yesterday but dont use RD so i cant make sure the RD scrapers results play lol

sergio49650 commented 5 years ago

I understand your point of view and my version do not manage this.

But, when searching for example "Harry Quebert S01E06" on the rlsbb web site, you get S01E05-E06. Your rlsbb addon version do not manage this kind of query, it find nothing to return.

It's only what I've tried to solve in my version. :)

rlsbb has his own kody addon, could be interesting to take a look at their code

I allready checked some RD scrapers and noticed that some scrapers are behind cloudflare too, and do not work as this. it's the case for
sceper, seriescr, scenerls.

maybe others. I could try the cfscrape option to create conenction and cookies if not too long.

thanks

Le 22/11/2018 à 13:27, jewbmx a écrit :

Ewww your email app just puked on my screen lol.

I checked out the website and seen it wasnt some kinda dupe thing, looks like the normal scraper doesnt load 'older posts' and your updated scraper does. I just prefer my scrapers to be around 10 results max since theres usually 20-40 scrapers for each search.

Also think i do have all those settings in my addon but i test bareback and wide open to get real full results.

Since you use RD do you feel like going thru the RD scrapers and doin a full scraper check on em? I did all the normal en scrapers yesterday but dont use RD so i cant make sure the RD scrapers results play lol

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/I-A-C/script.module.lambdascrapers/pull/49#issuecomment-441015505, or mute the thread https://github.com/notifications/unsubscribe-auth/AbK3YW_Ql7IN4G1DaZp7_c6UYoCZaSFFks5uxphBgaJpZM4Yub1q.

jewbmx commented 5 years ago

I dont think theres anything wrong with your scrapers but feel like they might need a little tweaking to try and counter massive results, theres about 30 other scrapers that i feel the same way about lol.

I think the simese episode issue is so rare that only a few scrapers handle it and probably not the right way, usually another website has the episodes as singles to counter the issue.

SerpentDrago commented 5 years ago

I say the more results the better , The whole point of hoster scrapers is to get content thats hard to find sometimes IMHO (though i mainly use Torrents and PM as mymains)

I'd love to see a PR with the Cloudflare fix's for Debrid Please !

sergio49650 commented 5 years ago

Hi,

RLSBB is totally behind cloudflare now (rlsbb.to too)

So no choice but cfscraper, I tried it with rlsbb and got back a lot of results.

I agree with you , I prefer few good scraper with lot of results than a lot of scrapers with few results.

here my rlsbb scraper file

thanks

Le 01/12/2018 à 17:36, SerpentDrago a écrit :

I say the more results the better , The whole point of hoster scrapers is to get content thats hard to find sometimes IMHO (though i mainly use Torrents and PM as mymains)

I'd love to see a PR with the Cloudflare fix's for Debrid Please !

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/I-A-C/script.module.lambdascrapers/pull/49#issuecomment-443438850, or mute the thread https://github.com/notifications/unsubscribe-auth/AbK3YUjnyqaOoI1XAbrP9w12-wqoryg9ks5u0rAbgaJpZM4Yub1q.

-- coding: UTF-8 --

''' rlsbb scraper for Exodus forks. Sep 5 2018 - Cleaned and Checked

Updated and refactored by someone.
Originally created by others.

''' import re, traceback, urllib, urlparse, json, random, time

from resources.lib.modules import cleantitle from resources.lib.modules import client from resources.lib.modules import control from resources.lib.modules import debrid from resources.lib.modules import log_utils from resources.lib.modules import source_utils from resources.lib.modules import cfscrape

class source: def init(self): self.priority = 1 self.language = ['en'] self.domain = 'rlsbb.ru'
self.base_link = 'http://rlsbb.ru' # http//search.rlsbb.to doesn't exist self.search_link = 'http://search.rlsbb.ru/lib/search6515260491260.php?phrase=%s&pindex=1&radit=0.%s'

this search link give json with good names for urls

    self.scraper = cfscrape.create_scraper()

def movie(self, imdb, title, localtitle, aliases, year):
    try:
        url = {'imdb': imdb, 'title': title, 'year': year}
        url = urllib.urlencode(url)
        return url
    except:
        failure = traceback.format_exc()
        log_utils.log('RLSBB - Exception: \n' + str(failure))
        return

def tvshow(self, imdb, tvdb, tvshowtitle, localtvshowtitle, aliases, year):
    try:
        url = {'imdb': imdb, 'tvdb': tvdb, 'tvshowtitle': tvshowtitle, 'year': year}
        url = urllib.urlencode(url)
        return url
    except:
        failure = traceback.format_exc()
        log_utils.log('RLSBB - Exception: \n' + str(failure))
        return

def episode(self, url, imdb, tvdb, title, premiered, season, episode):
    try:
        if url == None: return

        url = urlparse.parse_qs(url)
        url = dict([(i, url[i][0]) if url[i] else (i, '') for i in url])
        url['title'], url['premiered'], url['season'], url['episode'] = title, premiered, season, episode
        url = urllib.urlencode(url)
        return url
    except:
        failure = traceback.format_exc()
        log_utils.log('RLSBB - Exception: \n' + str(failure))
        return

def sources(self, url, hostDict, hostprDict):
    try:
        log_utils.log("rlsbb debug")

        sources = []
        query_bases  = []
        options = []
        html    = None

        if url == None: return sources

        if debrid.status() == False: raise Exception()

        data = urlparse.parse_qs(url)   
        #log_utils.log("data : " + str(data))      
        data = dict([(i, data[i][0]) if data[i] else (i, '') for i in data])        
        title = (data['tvshowtitle'] if 'tvshowtitle' in data else data['title'])
        hdlr = 'S%02dE%02d' % (int(data['season']), int(data['episode'])) if 'tvshowtitle' in data else data['year']
        premDate = ''

        r = None

        # TVshows
        if 'tvshowtitle' in data:   

            # log_utils.log("RLSBB TV show")

            # tvshowtitle
            query_bases.append('%s ' % (data['tvshowtitle'].replace("-","")))  # (ex 9-1-1 become 911)
            # tvshowtitle + year (ex Titans-2018-s01e1 or Insomnia-2018-S01)
            query_bases.append('%s %s ' % (data['tvshowtitle'].replace("-",""), data['year']))

            # season and episode (classic)
            options.append('S%02dE%02d' % (int(data['season']), int(data['episode'])))
            # season space episode (for double episode like S02E02-E03)
            options.append('S%02d E%02d' % (int(data['season']), int(data['episode'])))
            # season and episode1 - epsiode2 (two episodes at a time)
            #options.append('S%02dE%02d-E%02d' % (int(data['season']), int(data['episode']),   int(data['episode'])+1))
            #options.append('S%02dE%02d-E%02d' % (int(data['season']), int(data['episode'])-1, int(data['episode'])))
            # season only (ex ozark-S02, group of episodes)
            options.append('S%02d' % (int(data['season'])))

            log_utils.log("RLSBB querys : " + str(options))

            r = self.search(query_bases, options)

            log_utils.log("RLSBB r : " + r)
        else:
            #log_utils.log("RLSBB Movie")
            #  Movie
            query_bases.append('%s ' % (data['title']))
            options.append('%s' % (data['year']))
            r = self.search(query_bases, options)

        # looks like some shows have had episodes from the current season released in s00e00 format before switching to YYYY-MM-DD
        # this causes the second fallback search above for just s00 to return results and stops it from searching by date (ex. http://rlsbb.to/vice-news-tonight-s02)
        # so loop here if no items found on first pass and force date search second time around
        # This works till now, so only minor changes 
        for loopCount in range(0,2):
            # query_bases.clear()     # pyhton 3
            query_bases = []
            options = []

            if loopCount == 1 or (r == None and 'tvshowtitle' in data) :                     # s00e00 serial failed: try again with YYYY-MM-DD
                # http://rlsbb.to/the-daily-show-2018-07-24                                 ... example landing urls
                # http://rlsbb.to/stephen-colbert-2018-07-24                                ... case and "date dots" get fixed by rlsbb

                premDate = re.sub('[ \.]','-',data['premiered'])+" "
                query = re.sub('[\\\\:;*?"<>|/\-\']', '', data['tvshowtitle'])              

                query_bases.append(query)
                options.append(premDate)

                r = self.search(query_bases,options)

            posts = client.parseDOM(r, "div", attrs={"class": "content"})   # get all <div class=content>...</div>
            hostDict = hostprDict + hostDict                                # ?
            items = []

            for post in posts:
                try:
                    u = client.parseDOM(post, 'a', ret='href')              # get all <a href=..... </a>
                    for i in u:                                             # foreach href url
                        try:
                            name = str(i)
                            if hdlr in name.upper(): items.append(name)
                            elif len(premDate) > 0 and premDate in name.replace(".","-"): items.append(name)      # s00e00 serial failed: try again with YYYY-MM-DD
                            # NOTE: the vast majority of rlsbb urls are just hashes! Future careful link grabbing would yield 2x or 3x results
                        except:
                            pass
                except:
                    pass

            if len(items) > 0: break

        seen_urls = set()

        for item in items:
            try:
                info = []

                url = str(item)
                url = client.replaceHTMLCodes(url)
                url = url.encode('utf-8')

                if url in seen_urls: continue
                seen_urls.add(url)

                host = url.replace("\\", "")
                host2 = host.strip('"')
                host = re.findall('([\w]+[.][\w]+)$', urlparse.urlparse(host2.strip().lower()).netloc)[0]

                if not host in hostDict: raise Exception()
                host = client.replaceHTMLCodes(host)
                host = host.encode('utf-8')

                if any(x in host2 for x in ['.rar', '.zip', '.iso']): continue

                if '720p' in host2:
                    quality = 'HD'
                elif '1080p' in host2:
                    quality = '1080p'
                else:
                    quality = 'SD'

                info = ' | '.join(info)

                sources.append({'source': host, 'quality': quality, 'language': 'en', 'url': host2, 'info': info, 
                                'direct': False, 'debridonly': True})
                # why is this hardcoded to debridonly=True? seems like overkill but maybe there's a resource-management reason?
            except:
                pass
            log_utils.log("RLSBB sources = " + str(sources))

        check = [i for i in sources if not i['quality'] == 'CAM']
        if check: sources = check
    except:
        failure = traceback.format_exc()
        log_utils.log('RLSBB - Exception: \n' + str(failure))
    return sources # one return is enough !

def search(self, query_bases, options):
    i = 0
    j = 0
    result = None
    for option in options:

        for query_base in query_bases :
            q = re.sub('(\\\|/| -|:|;|\*|\?|"|\'|<|>|\|)', '', query_base)
            q = q.replace("  ", " ").replace(" ", "-")

            query = q + option
            log_utils.log("RLSBB query : " + query)

            url = self.search_link % (query, random.randint(00000000000000001, 99999999999999999))
            html = self.scraper.get(url).content

            posts = json.loads(html)['results'][0]
            # if not in first place then you should get crapy thing

            if posts['domain'] == self.domain:
                # get url to get debrid links
                link = urlparse.urljoin(self.base_link, posts['post_name'])

                result = self.scraper.get(link)
                log_utils.log("RLSBB test " + str(i) + " : " + str(result.status_code))

                # I got code 503 few times these days, but when retrying with a little delay I got code 200
                while result.status_code == 503 and j < 5 :
                    time.sleep(0.5)
                    log_utils.log("RLSBB try test " + str(i))
                    result = self.scraper.get(link)
                    log_utils.log("RLSBB test " + str(i) + " : " + str(result.status_code))
                    j += 1

                if result.status_code == 200:
                    return result.content
                else: 
                    log_utils.log("RLSBB test "+ str(i) + " return code : " + result.status_code + "- next test " + str(i+1))
                    i += 1

    return None

def resolve(self, url):
    return url
host505 commented 5 years ago

HI, why don't you update this pr so it includes your fixed rlsbb scraper? Just modify/replace rlsbb.py on your fork (the one you used for this pr) and the changes will apply here. Also maybe remove/fix your 2ddl because it doesn't work well with tvshows.

sergio49650 commented 5 years ago

Hi,

I'm not working on official release, I send them my changes but didin't use them yet. Don't know if they will do.

Here my last changes, You can try them.

Thanks

Le 02/12/2018 à 19:47, host505 a écrit :

HI, why don't you update this pr so it includes your fixed rlsbb scraper? Just modify/replace rlsbb.py on your fork (the one you used for this pr) and the changes will apply here. Also maybe remove/fix your 2ddl because it doesn't work well with tvshows.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/I-A-C/script.module.lambdascrapers/pull/49#issuecomment-443531989, or mute the thread https://github.com/notifications/unsubscribe-auth/AbK3YZWCC3i6SY0B3rF_2lR-9SoNFhcJks5u1CA-gaJpZM4Yub1q.

-- coding: UTF-8 --

#######################################################################

----------------------------------------------------------------------------

"THE BEER-WARE LICENSE" (Revision 42):

@tantrumdev wrote this file. As long as you retain this notice you

can do whatever you want with this stuff. If we meet some day, and you think

this stuff is worth it, you can buy me a beer in return. - Muad'Dib

----------------------------------------------------------------------------

#######################################################################

-Cleaned and Checked on 10-10-2018 by JewBMX in Yoda.

import re,traceback,urllib,urlparse

from resources.lib.modules import cleantitle from resources.lib.modules import client from resources.lib.modules import debrid from resources.lib.modules import source_utils from resources.lib.modules import log_utils

class source: def init(self): self.priority = 1 self.language = ['en'] self.domains = ['2ddl.ws'] self.base_link = 'http://2ddl.ws/?s='

self.search_link = '/search/%s/feed/rss2/' # too long and often HS

def movie(self, imdb, title, localtitle, aliases, year):
    try:
        url = {'imdb': imdb, 'title': title, 'year': year}
        url = urllib.urlencode(url)
        return url
    except:
        failure = traceback.format_exc()
        log_utils.log('2DDL - Exception: \n' + str(failure))
        return

def tvshow(self, imdb, tvdb, tvshowtitle, localtvshowtitle, aliases, year):
    try:
        url = {'imdb': imdb, 'tvdb': tvdb, 'tvshowtitle': tvshowtitle, 'year': year}
        url = urllib.urlencode(url)
        return url
    except:
        failure = traceback.format_exc()
        log_utils.log('2DDL - Exception: \n' + str(failure))
        return

def episode(self, url, imdb, tvdb, title, premiered, season, episode):
    try:
        if url == None: return

        url = urlparse.parse_qs(url)
        url = dict([(i, url[i][0]) if url[i] else (i, '') for i in url])
        url['title'], url['premiered'], url['season'], url['episode'] = title, premiered, season, episode
        url = urllib.urlencode(url)
        return url
    except:
        failure = traceback.format_exc()
        log_utils.log('2DDL - Exception: \n' + str(failure))
        return

def sources(self, url, hostDict, hostprDict):
    try:
        sources = []
        query_bases = []
        options = []
        html = ""

        log_utils.log("2DDL debug")

        if url == None: return sources

        if debrid.status() is False: raise Exception()

        data = urlparse.parse_qs(url)
        data = dict([(i, data[i][0]) if data[i] else (i, '') for i in data])
        title = data['tvshowtitle'] if 'tvshowtitle' in data else data['title']
        hdlr = 'S%02dE%02d' % (int(data['season']), int(data['episode'])) if 'tvshowtitle' in data else data['year']

        # TVshows
        if 'tvshowtitle' in data:

            query_bases.append('%s ' % (data['tvshowtitle']))  # (ex 9-1-1 become 911)
            # tvshowtitle + year (ex Titans-2018-s01e1 or Insomnia-2018-S01)
            query_bases.append('%s %s ' % (data['tvshowtitle'], data['year']))

            options.append('S%02d E%02d' % (int(data['season']), int(data['episode'])))
            # season only (ex ozark-S02, group of episodes)
            options.append('S%02d' % (int(data['season'])))

            html = self.search(query_bases, options)

        else:
            #log_utils.log("2DDL Movie")
            #  Movie
            query_bases.append('%s ' % (data['title']))
            options.append('%s' % (data['year']))
            html = self.search(query_bases, options)

        urls  = client.parseDOM(html, 'a', ret="href", attrs={"class":"more-link"})
        #log_utils.log("2DDL urls : " + str(urls))

        r = ""

        for url in urls:

            html = client.request(url)

            while html != "":
                try:
                    r += (html.split  ("<singlelink>"))[1].split("<Download>")[0]
                    html    =  html.split("<Download>")[1]
                except:
                    html = ""

        posts = client.parseDOM(r, 'a', ret='href') 

        log_utils.log("2DDL posts = "+ str(posts))

        hostDict = hostprDict + hostDict

        items = []

        for post in posts:
            try:
                item = str(post)
                # have to filter on title and space become . in url name
                # exemple "this is us" return everything with "us" , with filter return this.is.us
                if hdlr in item.upper() and title.upper() in item.upper().replace("."," "): 
                    items.append(item)
            except:
                pass

        log_utils.log("2DDL items : " + str(items))

        for item in items:
            try:

                info = []
                url = str(item)
                url = client.replaceHTMLCodes(url)
                url = url.encode('utf-8')

                host = url.replace("\\", "")
                host2 = host.strip('"')
                host = re.findall('([\w]+[.][\w]+)$', urlparse.urlparse(host2.strip().lower()).netloc)[0]

                if not host in hostDict: raise Exception()
                host = client.replaceHTMLCodes(host)
                host = host.encode('utf-8')

                if any(x in host2 for x in ['.rar', '.zip', '.iso']): continue

                if '720p' in host2:
                    quality = 'HD'
                elif '1080p' in host2:
                    quality = '1080p'
                else:
                    quality = 'SD'

                info = ' | '.join(info)

                sources.append({'source': host, 'quality': quality, 'language': 'en', 'url': url, 'info': info,
                                'direct': False, 'debridonly': True})
            except:
                pass

        check = [i for i in sources if not i['quality'] == 'CAM']
        if check: sources = check
        #log_utils.log("2DDL sources = " + str(sources))
    except:
        failure = traceback.format_exc()
        log_utils.log('2DDL - Exception: \n' + str(failure))

    return sources    # one return is enough !

def search(self, query_bases, options):
    i = 0
    result = None
    for query_base in query_bases:

        q = re.sub('(\\\|/| -|:|;|\*|\?|"|\'|<|>|\|)', '', query_base)
        q = q.replace("  ", " ").replace(" ", "+")

        for option in options:
            query = q + option
            log_utils.log("2DDL query : " + query)

            #result = self.scraper.get("http://search.rlsbb.ru/" + q).content        
            result = client.request(self.base_link + q)

            if (result != None):
                log_utils.log("2DDL test " + str(i) + " Ok :" + str(len(result)))
                return result
            else:
                log_utils.log("2DDL test " + str(i) + " = None - trying test " + str(i+1))
                i += 1

    return None

def resolve(self, url):
    return url

-- coding: UTF-8 --

''' rlsbb scraper for Exodus forks. Sep 5 2018 - Cleaned and Checked

Updated and refactored by someone.
Originally created by others.

''' import re, traceback, urllib, urlparse, json, random, time

from resources.lib.modules import cleantitle from resources.lib.modules import client from resources.lib.modules import control from resources.lib.modules import debrid from resources.lib.modules import log_utils from resources.lib.modules import source_utils from resources.lib.modules import cfscrape

class source: def init(self): self.priority = 1 self.language = ['en'] self.domain = 'rlsbb.ru'
self.base_link = 'http://rlsbb.ru' # http//search.rlsbb.to doesn't exist self.search_link = 'http://search.rlsbb.ru/lib/search6515260491260.php?phrase=%s&pindex=1&radit=0.%s'

    self.scraper = cfscrape.create_scraper()
    #self.headers = {'User-Agent': client.agent(),'Referer': self.base_link}
    # scraper is for .ru, but unfortuinately, search engine give kind of dynamic datatable whithout data usable from html

def movie(self, imdb, title, localtitle, aliases, year):
    try:
        url = {'imdb': imdb, 'title': title, 'year': year}
        url = urllib.urlencode(url)
        return url
    except:
        failure = traceback.format_exc()
        log_utils.log('RLSBB - Exception: \n' + str(failure))
        return

def tvshow(self, imdb, tvdb, tvshowtitle, localtvshowtitle, aliases, year):
    try:
        url = {'imdb': imdb, 'tvdb': tvdb, 'tvshowtitle': tvshowtitle, 'year': year}
        url = urllib.urlencode(url)
        return url
    except:
        failure = traceback.format_exc()
        log_utils.log('RLSBB - Exception: \n' + str(failure))
        return

def episode(self, url, imdb, tvdb, title, premiered, season, episode):
    try:
        if url == None: return

        url = urlparse.parse_qs(url)
        url = dict([(i, url[i][0]) if url[i] else (i, '') for i in url])
        url['title'], url['premiered'], url['season'], url['episode'] = title, premiered, season, episode
        url = urllib.urlencode(url)
        return url
    except:
        failure = traceback.format_exc()
        log_utils.log('RLSBB - Exception: \n' + str(failure))
        return

def sources(self, url, hostDict, hostprDict):
    try:
        log_utils.log("rlsbb debug")

        sources = []
        query_bases  = []
        options = []
        html    = None

        if url == None: return sources

        if debrid.status() == False: raise Exception()

        data = urlparse.parse_qs(url)   
        #log_utils.log("data : " + str(data))      
        data = dict([(i, data[i][0]) if data[i] else (i, '') for i in data])        
        title = (data['tvshowtitle'] if 'tvshowtitle' in data else data['title'])
        hdlr = 'S%02dE%02d' % (int(data['season']), int(data['episode'])) if 'tvshowtitle' in data else data['year']
        premDate = ''

        r = None

        # TVshows
        if 'tvshowtitle' in data:   

            # log_utils.log("RLSBB TV show")

            # tvshowtitle
            query_bases.append('%s ' % (data['tvshowtitle'].replace("-","")))  # (ex 9-1-1 become 911)
            # tvshowtitle + year (ex Titans-2018-s01e1 or Insomnia-2018-S01)
            query_bases.append('%s %s ' % (data['tvshowtitle'].replace("-",""), data['year']))

            # season and episode (classic)
            options.append('S%02dE%02d' % (int(data['season']), int(data['episode'])))
            # season space episode (for double episode like S02E02-E03)
            options.append('S%02d E%02d' % (int(data['season']), int(data['episode'])))
            # season and episode1 - epsiode2 (two episodes at a time)
            #options.append('S%02dE%02d-E%02d' % (int(data['season']), int(data['episode']),   int(data['episode'])+1))
            #options.append('S%02dE%02d-E%02d' % (int(data['season']), int(data['episode'])-1, int(data['episode'])))
            # season only (ex ozark-S02, group of episodes)
            options.append('S%02d' % (int(data['season'])))

            log_utils.log("RLSBB querys : " + str(options))

            r = self.search(query_bases, options)

            log_utils.log("RLSBB r : " + r)
        else:
            #log_utils.log("RLSBB Movie")
            #  Movie
            query_bases.append('%s ' % (data['title']))
            options.append('%s' % (data['year']))
            r = self.search(query_bases, options)

        # looks like some shows have had episodes from the current season released in s00e00 format before switching to YYYY-MM-DD
        # this causes the second fallback search above for just s00 to return results and stops it from searching by date (ex. http://rlsbb.to/vice-news-tonight-s02)
        # so loop here if no items found on first pass and force date search second time around
        # This works till now, so only minor changes 
        for loopCount in range(0,2):
            # query_bases.clear()     # pyhton 3
            query_bases = []
            options = []

            if loopCount == 1 or (r == None and 'tvshowtitle' in data) :                     # s00e00 serial failed: try again with YYYY-MM-DD
                # http://rlsbb.to/the-daily-show-2018-07-24                                 ... example landing urls
                # http://rlsbb.to/stephen-colbert-2018-07-24                                ... case and "date dots" get fixed by rlsbb

                premDate = re.sub('[ \.]','-',data['premiered'])+" "
                query = re.sub('[\\\\:;*?"<>|/\-\']', '', data['tvshowtitle'])              

                query_bases.append(query)
                options.append(premDate)

                r = self.search(query_bases,options)

            posts = client.parseDOM(r, "div", attrs={"class": "content"})   # get all <div class=content>...</div>
            hostDict = hostprDict + hostDict                                # ?
            items = []

            for post in posts:
                try:
                    u = client.parseDOM(post, 'a', ret='href')              # get all <a href=..... </a>
                    for i in u:                                             # foreach href url
                        try:
                            name = str(i)
                            if hdlr in name.upper(): items.append(name)
                            elif len(premDate) > 0 and premDate in name.replace(".","-"): items.append(name)      # s00e00 serial failed: try again with YYYY-MM-DD
                            # NOTE: the vast majority of rlsbb urls are just hashes! Future careful link grabbing would yield 2x or 3x results
                        except:
                            pass
                except:
                    pass

            if len(items) > 0: break

        seen_urls = set()

        for item in items:
            try:
                info = []

                url = str(item)
                url = client.replaceHTMLCodes(url)
                url = url.encode('utf-8')

                if url in seen_urls: continue
                seen_urls.add(url)

                host = url.replace("\\", "")
                host2 = host.strip('"')
                host = re.findall('([\w]+[.][\w]+)$', urlparse.urlparse(host2.strip().lower()).netloc)[0]

                if not host in hostDict: raise Exception()
                host = client.replaceHTMLCodes(host)
                host = host.encode('utf-8')

                if any(x in host2 for x in ['.rar', '.zip', '.iso']): continue

                if '720p' in host2:
                    quality = 'HD'
                elif '1080p' in host2:
                    quality = '1080p'
                else:
                    quality = 'SD'

                info = ' | '.join(info)

                sources.append({'source': host, 'quality': quality, 'language': 'en', 'url': host2, 'info': info, 
                                'direct': False, 'debridonly': True})
                # why is this hardcoded to debridonly=True? seems like overkill but maybe there's a resource-management reason?
            except:
                pass
            log_utils.log("RLSBB sources = " + str(sources))

        check = [i for i in sources if not i['quality'] == 'CAM']
        if check: sources = check
    except:
        failure = traceback.format_exc()
        log_utils.log('RLSBB - Exception: \n' + str(failure))
    return sources # one return is enough !

def search(self, query_bases, options):
    i = 0
    j = 0
    result = None
    for option in options:

        for query_base in query_bases :
            q = re.sub('(\\\|/| -|:|;|\*|\?|"|\'|<|>|\|)', '', query_base)
            q = q.replace("  ", " ").replace(" ", "-")

            query = q + option
            log_utils.log("RLSBB query : " + query)

            url = self.search_link % (query, random.randint(00000000000000001, 99999999999999999))
            html = self.scraper.get(url).content

            posts = json.loads(html)['results'][0]
            # if not in first place then you should get crapy thing

            if posts['domain'] == self.domain:
                # get url to get debrid links
                link = urlparse.urljoin(self.base_link, posts['post_name'])

                result = self.scraper.get(link)
                log_utils.log("RLSBB test " + str(i) + " : " + str(result.status_code))

                # I got code 503 few times these days, but when retrying with a little delay I got code 200
                while result.status_code == 503 and j < 5 :
                    time.sleep(0.5)
                    log_utils.log("RLSBB try test " + str(i))
                    result = self.scraper.get(link)
                    log_utils.log("RLSBB test " + str(i) + " : " + str(result.status_code))
                    j += 1

                if result.status_code == 200:
                    return result.content
                else: 
                    log_utils.log("RLSBB test "+ str(i) + " return code : " + result.status_code + "- next test " + str(i+1))
                    i += 1

    return None

def resolve(self, url):
    return url

-- coding: UTF-8 --

#######################################################################

----------------------------------------------------------------------------

"THE BEER-WARE LICENSE" (Revision 42):

@Daddy_Blamo wrote this file. As long as you retain this notice you

can do whatever you want with this stuff. If we meet some day, and you think

this stuff is worth it, you can buy me a beer in return. - Muad'Dib

----------------------------------------------------------------------------

#######################################################################

Addon Name: Placenta

Addon id: plugin.video.placenta

Addon Provider: Mr.Blamo

import requests, re, traceback,urllib, urlparse from bs4 import BeautifulSoup from resources.lib.modules import client from resources.lib.modules import source_utils from resources.lib.modules import log_utils from resources.lib.modules import debrid

class source: def init(self): self.priority = 1 self.language = ['en'] self.domain = 'rlsscn.in' self.base_link = 'http://tvdownload.net/' self.search_link = self.base_link+'?s=%s'

def movie(self, imdb, title, localtitle, aliases, year):
    try:
        url = {'imdb': imdb, 'title': title, 'year': year}
        return url
    except:
        failure = traceback.format_exc()
        log_utils.log('RLSSCN - Exception: \n' + str(failure))
        return

def tvshow(self, imdb, tvdb, tvshowtitle, localtvshowtitle, aliases, year):
    try:
        url = {'imdb': imdb, 'tvdb': tvdb, 'tvshowtitle': tvshowtitle, 'year': year}
        return url
    except:
        return

def episode(self, url, imdb, tvdb, title, premiered, season, episode):
    try:
        url['episode'] = episode
        url['season'] = season
        url['premiered'] = premiered
        return url
    except:
        failure = traceback.format_exc()
        log_utils.log('RLSSCN - Exception: \n' + str(failure))
        return

def sources(self, url, hostDict, hostprDict):
    try:
        hostDict = hostDict + hostprDict

        log_utils.log("RLSSCN debug")

        sources      = []
        query_bases  = []
        options      = []
        html         = None

        log_utils.log("RLSSCN url : "+ str(url))

        if url == None: return sources

        if debrid.status() == False: raise Exception()

        data = url
        #data = dict([(i, data[i][0]) if data[i] else (i, '') for i in data])        
        title = (data['tvshowtitle'] if 'tvshowtitle' in data else data['title'])
        hdlr = 'S%02dE%02d' % (int(data['season']), int(data['episode'])) if 'tvshowtitle' in data else data['year']
        premDate = ''

        # tvshowtitle
        if 'tvshowtitle' in data: 
            query_bases.append('%s ' % (data['tvshowtitle']))  # (ex 9-1-1 become 911)
            # tvshowtitle + year (ex Titans-2018-s01e1 or Insomnia-2018-S01)
            query_bases.append('%s %s ' % (data['tvshowtitle'], data['year']))

            # season and episode (classic)
            options.append('S%02dE%02d' % (int(data['season']), int(url['episode'])))
            # season space episode (for double episode like S02E02-E03)
            options.append('S%02d E%02d' % (int(data['season']), int(data['episode'])))
            # season and episode1 - epsiode2 (two episodes at a time)
            #options.append('S%02dE%02d-E%02d' % (int(data['season']), int(data['episode']),   int(data['episode'])+1))
            #options.append('S%02dE%02d-E%02d' % (int(data['season']), int(data['episode'])-1, int(data['episode'])))
            # season only (ex ozark-S02, group of episodes)
            options.append('S%02d' % (int(data['season'])))

            html = self.search(query_bases, options)

        else:
            #log_utils.log("RLSSCN Movie")
            #  Movie
            query_bases.append('%s ' % (data['title']))
            options.append('%s' % (data['year']))
            html = self.search(query_bases, options)

        # this split is based on TV shows, soooo... might screw up movies
        # grab the relevent div and chop off the footer
        html = client.parseDOM(html, "div", attrs={"id": "content"})[0]
        html = re.sub('class="wp-post-navigation.+','', html, flags=re.DOTALL)
        sects = html.split('<p>')

        log_utils.log("RLSSCN html links : " + str(sects))

        for sect in sects:
            hrefs = client.parseDOM(sect, "a", attrs={"class": "autohyperlink"}, ret='href')
            if not hrefs: continue

            # filenames (with useful info) seem predictably located
            try: fn = re.match('(.+?)</strong>',sect).group(1)
            except: fn = ''
            log_utils.log('*** fn: %s' % fn)

            # sections under filenames usually have sizes (for tv at least)
            size = ""
            try: 
                size = re.findall('([0-9,\.]+ ?(?:GB|GiB|MB|MiB))', sect)[0]
                div = 1 if size.endswith(('GB', 'GiB')) else 1024
                size = float(re.sub('[^0-9\.]', '', size)) / div
                size = '%.2f GB' % size
            except: pass

            for url in hrefs:
                quality, info = source_utils.get_release_quality(url,fn)
                info.append(size)
                info = ' | '.join(info)
                log_utils.log(' ** (%s %s) url=%s' % (quality,info,url)) #~~~~~~~~~~~

                url = url.encode('utf-8')
                hostDict = hostDict + hostprDict

                valid, host = source_utils.is_host_valid(url, hostDict)
                if not valid: continue

                log_utils.log(' ** VALID! (host=%s)' % host) #~~~~~~~~~~~~~~~
                sources.append({'source': host, 'quality': quality, 'language': 'en', 'url': url,
                            'info': info, 'direct': False, 'debridonly': False})

        return sources
    except:
        log_utils.log("RLSSCN oups..." + str(traceback.format_exc()))

def search(self, query_bases, options):
    i = 0
    j = 0
    result = None
    for option in options:

        for query_base in query_bases :
            q = re.sub('(\\\|/| -|:|;|\*|\?|"|\'|<|>|\|)', '', query_base)
            q = q.replace("  ", " ").replace(" ", "+")

            query = q + option
            log_utils.log("RLSSCN query : " + query)

            url = self.search_link % (query)
            html = requests.get(url)

            log_utils.log("RLSSCN try test " + str(i) + " - html : " + str(html))

            if html.status_code == 200 :
                log_utils.log("RLSSCN test " + str(i) + " Ok")
                url = client.parseDOM(html.content, "h2", attrs={"class": "title"})
                url = client.parseDOM(url, "a", ret='href')
                log_utils.log("RLSSCN test " + str(i) + " : " + str(url))
                html = requests.get(url[0])
                if html.status_code == 200 :
                    return html.content
            else :    
                log_utils.log("RLSSCN test "+ str(i) + " return code : " + result.status_code + "- next test " + str(i+1))
                i += 1

    return None

def resolve(self, url):
    return url
SerpentDrago commented 5 years ago

dude you have to submit the file. don't copy and paste into this comment field. it kills formating and indents