erikriver / opengraph

A python module to parse the Open Graph Protocol
http://ogp.me/
MIT License
226 stars 82 forks source link

How to set custom User Agent? #31

Open Zerokami opened 6 years ago

Zerokami commented 6 years ago

Udemy.com is blocking the default User Agent of opengraph.

I'm getting

How do I set a custom user agent for OpenGraph module

urllib2.HTTPError: HTTP Error 403: Unauthorized

As a workaround I have created a custom getter using requests module

def custom_get_img_from_link(link):
    """
    """
    #headers = {"User-Agent":get_random_UA()}
    headers = {"User-Agent": "My bot"}
    r = requests.get(link, headers=headers)

    parsed_uri = urlparse(link)
    domain = '{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri)

    OpenGraph.parser = parser
    OpenGraph.scrape = True  # workaround for some subtle bug in opengraph

    page = OpenGraph(html=r.content)

    if page.is_valid():

        image_url = page.get('image', None)

        if not image_url.startswith('http'):
            image_url = urljoin(domain, page['image'])

        return image_url