fzakaria / HypeScript

Python HypeMachine script downloader
www.fzakaria.com
55 stars 21 forks source link

Fixed the first page endless download loop #2

Closed feesniped closed 12 years ago

feesniped commented 12 years ago

Let me know what you think.

fzakaria commented 12 years ago

I've put a comment on the file. Take a look.

feesniped commented 12 years ago

Turns out all you have to do is remove the "/' from line 56. Should look like this: complete_url = self.url + str(i) + '?ax=1&ts='+ str(time.time())

fzakaria commented 12 years ago

Finally had a chance to look at it. My script works for me. It properly pulled 3 pages worth from the popular page? It wouldn't make sense to remove the '/' since the links are : http://hypem.com/popular/3

feesniped commented 12 years ago

I've never scraped the popuar feed, but I will explain my logic. In like 56 you attempt (or so it seems) to define complete_url: complete_url = self.url + "/" + str(i) + '?ax=1&ts='+ str(time.time()).

If you then look at line 49, you clearly append a '/' at the beginning AND the end of the user-defined AREA_TO_SCRAPE. So it at least appears that you create the url with a '//' instead of a single '/.'

Perhaps Hypem deals with the feed urls for popular and favorite feeds in different ways. Perhaps you could take it for a spin again and instead of the popular feed, set scrape area as your hypem username.

fzakaria commented 12 years ago

Oh. Just don't have AREA_TO_SCRAPE with an end ending slash. Don't modify the rest of the code On Mar 8, 2012 6:31 PM, "jedunnigan" < reply@reply.github.com> wrote:

I've never scraped the popuar feed, but I will explain my logic. In like 56 you attempt (or so it seems) to define complete_url: complete_url = self.url + "/" + str(i) + '?ax=1&ts='+ str(time.time()).

If you then look at line 49, you clearly append a '/' at the beginning AND the end of the user-defined AREA_TO_SCRAPE. So it at least appears that you create the url with a '//' instead of a single '/.'

Perhaps Hypem deals with the feed urls for popular and favorite feeds in different ways. Perhaps you could take it for a spin again and instead of the popular feed, set scrape area as your hypem username.


Reply to this email directly or view it on GitHub: https://github.com/fzakaria/HypeScript/pull/2#issuecomment-4405517

feesniped commented 12 years ago

Yes, although I have to admit my reasoning was to future-proof the script. [I think] complete_url is better suited for additional functionality when it is more context-dependent.