CodyBerenson / PGMA-Modernized

An updated approach for Plex Gay Media Adult Agents for both Full Feature Films and Scenes
MIT License
124 stars 46 forks source link

Posters for IAFD #190

Open JPH71 opened 2 years ago

JPH71 commented 2 years ago

IAFD has links to AEBN, GayHotMovies, GayDVDEmpire, CD Universe just like GEVI does

With this in mind I have the code put in to scrape these external sites and get any data that is missing in IAFD. especially Posters and Background art.

Unfotunately on running the asp link to point to the shop - I get a 403 Forbidden result..... In chrome developer when I pic the link - I can see within the response header a Location entry that point to the webpage as it does in GEVi. I need to find out how to access this....

One of you helped as with the issues with GEVI, by setting up a refereal header instance some months ago... which saved my bacon in more ways than one.. Could you give some suggestions in ragard to this ---- the offending code is in utils.py - getFilmonIAFD function

Cheers

Jason xx

JPH71 commented 2 years ago

Cosy will have the code in the nest 10 minutes....

j-ktz commented 2 years ago

It looks like the blog sites (fagalicious) isn't pulling in posters now either but it could be our URL has expired.

CodyBerenson commented 2 years ago

(sorry for the duplicate post)

@fivedays555:

Hope this finds you well! @JPH71 wanted to once again say THANKS! You're quick solution above is going to allow him to add an enhancement to IAFD.....for films, IAFD provides links to index sites that have the film's cover artwork, so Jason will be working on an enhancement that should allow the IAFD agent to crawl to film Film covers, since IAFD itself doesn't contain artwork other than Actor headshots.

THANKS!

fivedays555 commented 2 years ago

Not a problem. Glad I can help. Let me know if you need any more information.

JPH71 commented 2 years ago

Here is the issue - this happens when using IAFD as the scraping Agent

IAFD has links to AEBN, GayHotMovies, GayDVDEmpire, CD Universe just like GEVI does

With this in mind I have the code put in to scrape these external sites and get any data that is missing in IAFD. especially Posters and Background art.

This is the section of the log file:

2022-08-19 02:34:25,484 (21f8) : INFO (logkit:16) - IAFD - UTILS :: Access External Links in IAFD: Skip Current Agent Links: IAFD 2022-08-19 02:34:25,484 (21f8) : INFO (logkit:16) - IAFD - UTILS :: External Sites Found 1 - AdultEmpire - https://www.iafd.com/shopclick.asp?sku=22956990 2022-08-19 02:34:25,500 (21f8) : INFO (logkit:16) - IAFD - UTILS :: 2 - HotMovies - https://www.iafd.com/shopclick.asp?sku=9344975 2022-08-19 02:34:25,500 (21f8) : INFO (logkit:16) - IAFD - UTILS :: 3 - HotMovies - https://www.iafd.com/shopclick.asp?sku=8390429 2022-08-19 02:34:25,500 (21f8) : INFO (logkit:16) - IAFD - UTILS :: 4 - AdultEmpire - https://www.iafd.com/shopclick.asp?sku=22956383 2022-08-19 02:34:25,500 (21f8) : INFO (logkit:16) - IAFD - UTILS :: Valid Sites Left 2 - ['AdultEmpire', 'HotMovies'] 2022-08-19 02:34:25,516 (21f8) : DEBUG (networking:143) - Requesting ' https://www.iafd.com/shopclick.asp?sku=8390429' 2022-08-19 02:34:25,625 (21f8) : ERROR (networking:196) - Error opening URL 'https://www.iafd.com/shopclick.asp?sku=8390429' 2022-08-19 02:34:25,625 (21f8) : ERROR (logkit:22) - IAFD - UTILS :: Error reading External HotMovies URL Link: HTTP Error 403: Forbidden 2022-08-19 02:34:25,641 (21f8) : DEBUG (networking:143) - Requesting ' https://www.iafd.com/shopclick.asp?sku=22956383' 2022-08-19 02:34:25,755 (21f8) : ERROR (networking:196) - Error opening URL 'https://www.iafd.com/shopclick.asp?sku=22956383' 2022-08-19 02:34:25,755 (21f8) : ERROR (logkit:22) - IAFD - UTILS :: Error reading External AdultEmpire URL Link: HTTP Error 403: Forbidden

I need to be able to get from : https://www.iafd.com/shopclick.asp?sku=9344975

to the following:

1 - is the link i have entered into the address bar - that changes to gay hotmovies

which shows up in 2 as the header.... [image: image.png]

inside utils.py the code is within the Function: getFilmOnIAFD line 210 and the error is caused by line 356..... the function HTML.ElementFromURL(value, timeout=60, errors='ignore', sleep= DELAY)

is a plex inbuilt function... I have some old documentation that explains this plex function if you need... but it works like the python requests library...

if you look at the GEVI init.py file, you will see how we implemented your previous suggestion to get it to start working...

Many thanks

Jason

On Fri, 19 Aug 2022 at 02:13, fivedays555 @.***> wrote:

Not a problem. Glad I can help. Let me know if you need any more information.

— Reply to this email directly, view it on GitHub https://github.com/CodyBerenson/PGMA-Modernized/issues/190#issuecomment-1220131249, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKI3AKLKWL5B3QJJN3LTMVLVZ3NUDANCNFSM55NEJ7BA . You are receiving this because you were mentioned.Message ID: @.***>

fivedays555 commented 2 years ago

I tried the following attempt. Should be working:

url='https://www.iafd.com/shopclick.asp?sku=9344975'
response = get_scraper_request(url)
res = html.fromstring(response.text)
res.xpath('//*[@class="title"]')[0].text
>>> 'Fire Watch 2'

I think the direct request would fail is because the iafd using Cloudflare to block unwanted requests.

JPH71 commented 2 years ago

You are a star!

is the get_scraper_request code already in the plex agent??

Cheers

On Fri, 19 Aug 2022 at 04:11, fivedays555 @.***> wrote:

I tried the following attempt. Should be working:

url='https://www.iafd.com/shopclick.asp?sku=9344975' response = get_scraper_request(url) res = html.fromstring(response.text) @.***="title"]')[0].text

'Fire Watch 2'

I think the direct request would fail is because the iafd using Cloudflare to block unwanted requests.

— Reply to this email directly, view it on GitHub https://github.com/CodyBerenson/PGMA-Modernized/issues/190#issuecomment-1220195555, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKI3AKJYMCJJYWWVB5C3HELVZ33PVANCNFSM55NEJ7BA . You are receiving this because you were mentioned.Message ID: @.***>

fivedays555 commented 2 years ago

Should be. Otherwise, you won't be able to scrape IAFD.

JPH71 commented 2 years ago

Just searched through the utils.py file and there is no module/route starting with get-scraper_request

cheers and sorry to be a nusidance

On Fri, 19 Aug 2022 at 07:34, fivedays555 @.***> wrote:

Should be. Otherwise, you won't be able to scrape IAFD.

— Reply to this email directly, view it on GitHub https://github.com/CodyBerenson/PGMA-Modernized/issues/190#issuecomment-1220305720, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKI3AKPHNK4FCLVLODTAJDDVZ4TIFANCNFSM55NEJ7BA . You are receiving this because you were mentioned.Message ID: @.***>

fivedays555 commented 2 years ago

No Problem. I will put the function call below.

import cloudscraper

scraper = cloudscraper.create_scraper()

def get_scraper_request(url, **kwargs):
    logging.info("Requesting: " + url)
    headers = kwargs.pop('headers', {})
    cookies = kwargs.pop('cookies', {})
    timeout = kwargs.pop('timeout', 30)
    proxies = {}

    global scraper

    if 'User-Agent' not in headers:
        # headers['User-Agent'] = (fake_useragent.UserAgent(fallback='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15')).random
        headers['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15'

    scraper.headers.update(headers)
    scraper.cookies.update(cookies)

    try:
        scraper_request = scraper.request(
            'GET', url, timeout=timeout, proxies=proxies)
    except Exception as ex:
        logging.exception('CloudScraper Failed.')

    if scraper_request and not scraper_request.ok:
        msg = ('< CloudScraper Failed Request Status Code: ' +
               str(scraper_request.status_code) + '>')
        logging.error(msg)

    return scraper_request
JPH71 commented 2 years ago

Cheers Man.....

I have been up all night - sorting out duplicate cast entries....

Thanks for all the help!

Jason

On Fri, 19 Aug 2022 at 08:06, fivedays555 @.***> wrote:

No Problem. I will put the function call below.

import cloudscraper

scraper = cloudscraper.create_scraper()

def get_scraper_request(url, **kwargs): logging.info("Requesting: " + url) headers = kwargs.pop('headers', {}) cookies = kwargs.pop('cookies', {}) timeout = kwargs.pop('timeout', 30) proxies = {}

global scraper

if 'User-Agent' not in headers:
    # headers['User-Agent'] = (fake_useragent.UserAgent(fallback='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15')).random
    headers['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15'

scraper.headers.update(headers)
scraper.cookies.update(cookies)

try:
    scraper_request = scraper.request(
        'GET', url, timeout=timeout, proxies=proxies)
except Exception as ex:
    logging.exception('CloudScraper Failed.')

if scraper_request and not scraper_request.ok:
    msg = ('< CloudScraper Failed Request Status Code: ' +
           str(scraper_request.status_code) + '>')
    logging.error(msg)

return scraper_request

— Reply to this email directly, view it on GitHub https://github.com/CodyBerenson/PGMA-Modernized/issues/190#issuecomment-1220326262, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKI3AKPK4HOJ7TCDCCKXZSDVZ4W5RANCNFSM55NEJ7BA . You are receiving this because you were mentioned.Message ID: @.***>

fivedays555 commented 2 years ago

Glad I can help. Cheers!

JPH71 commented 2 years ago

One last thing Adult Film Database....

I don't know what they changed - but the code to scrape now fails.... if you have the time - send me a few pointers so I can get this agent working again...

Your help has been much appreciated...

I will implement the changes you have sent into the GetFilmOnIAFD today and get back to you with the results as soon as possible...

I think I better have a date with Morpheus now... been up all night...

Jason xxx

On Fri, 19 Aug 2022 at 08:29, fivedays555 @.***> wrote:

Glad I can help. Cheers!

— Reply to this email directly, view it on GitHub https://github.com/CodyBerenson/PGMA-Modernized/issues/190#issuecomment-1220345169, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKI3AKK7HE62QTNJNCMNNUTVZ4ZWFANCNFSM55NEJ7BA . You are receiving this because you were mentioned.Message ID: @.***>

fivedays555 commented 2 years ago

Not sure what you need. But IAFD has a very sensitive request rate limit. To be safe, I put delay for each IAFD request as time.sleep(randint(100, 200)/10)

And all IAFD requests would need the cloudscraper function.

Let me know if you need more information.

JPH71 commented 2 years ago

The last request has to do with another agent, Adult Film Database not IAFD... rather than just building a search string one has to create formdata and headers and perform a pull request... A right pain in the nethers when it stops working....

I will put in that random time sleep in the IAFD code... in the cloudscraper section.

Thanks once again...

On Fri, 19 Aug 2022 at 08:50, fivedays555 @.***> wrote:

Not sure what you need. But IAFD has a very sensitive request rate limit. To be safe, I put delay for each IAFD request as time.sleep(randint(100, 200)/10)

And all IAFD requests would need the cloudscraper function.

Let me know if you need more information.

— Reply to this email directly, view it on GitHub https://github.com/CodyBerenson/PGMA-Modernized/issues/190#issuecomment-1220362367, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKI3AKMCCBZF3YJNA3K5G3LVZ44DTANCNFSM55NEJ7BA . You are receiving this because you were mentioned.Message ID: @.***>

fivedays555 commented 2 years ago

Oh, I did not realize it was for Adult Film Database.

I never touch or use the Adult Film Database agent, so I don't really know...

Mostly, I am using Waybig, Fagalicious Queerclick, and IAFD. They almost cover everything I need.

I took a look at Adult Film Database (https://www.adultfilmdatabase.com/), and I think there are so few gay titles there. Why bother?

CodyBerenson commented 1 year ago

@JPH71 Can this be closed?

JPH71 commented 1 year ago

Yes it can...

On Thu, 29 Dec 2022, 04:13 Cody Berenson, @.***> wrote:

@JPH71 https://github.com/JPH71 Can this be closed?

— Reply to this email directly, view it on GitHub https://github.com/CodyBerenson/PGMA-Modernized/issues/190#issuecomment-1367045104, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKI3AKLDC5LJMIDEEA4H2WTWPT6XJANCNFSM55NEJ7BA . You are receiving this because you were mentioned.Message ID: @.***>

JPH71 commented 1 year ago

No don't... I haven't sorted out IAFD posters yet

On Thu, 29 Dec 2022, 04:23 Jason Hudson, @.***> wrote:

Yes it can...

On Thu, 29 Dec 2022, 04:13 Cody Berenson, @.***> wrote:

@JPH71 https://github.com/JPH71 Can this be closed?

— Reply to this email directly, view it on GitHub https://github.com/CodyBerenson/PGMA-Modernized/issues/190#issuecomment-1367045104, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKI3AKLDC5LJMIDEEA4H2WTWPT6XJANCNFSM55NEJ7BA . You are receiving this because you were mentioned.Message ID: @.***>