eshad / httplib2

Automatically exported from code.google.com/p/httplib2
0 stars 0 forks source link

Can't pass web address to retrieve method #112

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Here is the code I'm using. It grabs the title, description and link to the 
original article.

h = httplib2.Http('.cache')
response, content = 
h.request('http://www.huffingtonpost.com/feeds/verticals/comedy/index.xml')

print(content[:64])

huffPostContent = str(content)

print(huffPostContent)

huffFinder = 
re.compile('<title>(.{5,90})</title>|<summary>(.{5,300})</summary>|<link 
rel="alternate" type="text/html" href="(.{5,100})"/>', re.IGNORECASE)
findHuff = re.findall(huffFinder,huffPostContent)

for i in findHuff:
    print(i[2].replace("\\'","'") + "\n" + i[1].replace("\\'","'") + "\n" + i[0].replace("\\'","'"))
    # New Stuff
    j = httplib2.Http('.cache')
    pageToSearch = str(i[2])
    print(pageToSearch)

    response, content = j.request(pageToSearch)
    huffArtContent = str(content)
    huffArtFinder = re.compile('<div class="entry_body_text">(.{5,400})<div class="contin_below">', re.IGNORECASE)
    findHuffArt = re.findall(huffArtFinder,huffArtContent)

I want to then search the original article, by performing a request(). But I 
get this error:

raise RelativeURIError("Only absolute URIs are allowed. uri = %s" % uri)
httplib2.RelativeURIError: Only absolute URIs are allowed. uri =  

What version of the product are you using? On what operating system?

I'm using Python 3 on a Mac.

Thanks for any help. I know I'm doing something silly

Original issue reported on code.google.com by derekba...@gmail.com on 16 Sep 2010 at 8:53

GoogleCodeExporter commented 9 years ago
You must take any URI that you find in the html and if it is an relative URI 
you must make it into an absolute URI using "www.huffingtonpost.com" as the 
base.

Original comment by joe.gregorio@gmail.com on 14 Feb 2011 at 3:59