dhellmann / google-highly-open-participation-psf

Automatically exported from code.google.com/p/google-highly-open-participation-psf
0 stars 0 forks source link

Write a pydigg-style module for reddit. #146

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Write an API for grabbing new, hot, etc. stories from reddit.com; see
pydigg,

      http://code.google.com/p/pydigg/

for inspiration in style if not in details.

Completion:

Upload a package containing the code (and associated documentation and
tests) as an attachment to this task.

Task duration: please complete this task within 5 days (120 hours) of claiming 
it.

Original issue reported on code.google.com by the.good...@gmail.com on 26 Nov 2007 at 6:27

GoogleCodeExporter commented 9 years ago
I claim this one
  --Kristopher Micinski (Kryptech)

Original comment by kryptech...@gmail.com on 28 Nov 2007 at 6:02

GoogleCodeExporter commented 9 years ago

Original comment by georg.br...@gmail.com on 28 Nov 2007 at 6:07

GoogleCodeExporter commented 9 years ago

Original comment by doug.hel...@gmail.com on 28 Nov 2007 at 11:59

GoogleCodeExporter commented 9 years ago
any progress?

Original comment by the.good...@gmail.com on 4 Dec 2007 at 11:26

GoogleCodeExporter commented 9 years ago
This task is due Monday, December 3, 2007 at 18:05:00 UTC.

Original comment by doug.hel...@gmail.com on 4 Dec 2007 at 5:31

GoogleCodeExporter commented 9 years ago
Per Google's rules, you have until December 6, 2007, 18:05:00 UTC to complete 
this task.

Original comment by georg.br...@gmail.com on 4 Dec 2007 at 9:26

GoogleCodeExporter commented 9 years ago
I suspect this task may be especially difficult because Reddit does not have 
any official API, much less one of 
Digg's calibre. I do not believe anything can be done with Reddit's server 
outside of plain HTTP GET/POST 
requests mimicking a browser, which could fail on the slightest change of their 
code.

Original comment by jeffwheeler on 6 Dec 2007 at 6:03

GoogleCodeExporter commented 9 years ago
This task has expired and is now re-opened and may be claimed by any contestant.

Original comment by doug.hel...@gmail.com on 7 Dec 2007 at 1:01

GoogleCodeExporter commented 9 years ago
I claim this task.

Original comment by dco...@gmail.com on 7 Dec 2007 at 8:07

GoogleCodeExporter commented 9 years ago

Original comment by the.good...@gmail.com on 7 Dec 2007 at 8:26

GoogleCodeExporter commented 9 years ago

Original comment by the.good...@gmail.com on 7 Dec 2007 at 8:26

GoogleCodeExporter commented 9 years ago

Original comment by the.good...@gmail.com on 7 Dec 2007 at 8:27

GoogleCodeExporter commented 9 years ago
Is there no progress? You have until today, December 15, 8:00 to submit at 
least a
partial solution.

Original comment by georg.br...@gmail.com on 15 Dec 2007 at 10:06

GoogleCodeExporter commented 9 years ago

Original comment by georg.br...@gmail.com on 15 Dec 2007 at 10:12

GoogleCodeExporter commented 9 years ago
This task has expired and is now re-opened and may be claimed by any contestant.

Original comment by doug.hel...@gmail.com on 19 Dec 2007 at 12:39

GoogleCodeExporter commented 9 years ago
No api on reddit, no deal. Scraping html is ugly and unreliable.

Original comment by dx%dxzon...@gtempaccount.com on 22 Dec 2007 at 2:31

GoogleCodeExporter commented 9 years ago
Scraping html is ugly and unreliable... If someone writes that module, it will 
get
broken.

Original comment by dx%dxzon...@gtempaccount.com on 22 Dec 2007 at 2:47

GoogleCodeExporter commented 9 years ago
Two points:

1. if you don't want the task it is fine, pick something else
2. scraping is ugly only if you don't know how to do it right

-------------------------------------

# here is a script to print all titles from reddit

import urllib
from BeautifulSoup import BeautifulSoup

page = urllib.urlopen('http://www.reddit.com').read()
soup = BeautifulSoup( page )

for title in soup.findAll('a', attrs={'class':'title '} ):
    print title.string

Original comment by istvan.a...@gmail.com on 22 Dec 2007 at 2:56

GoogleCodeExporter commented 9 years ago
I claim this one

Original comment by adolfo.fitoria on 22 Dec 2007 at 6:59

GoogleCodeExporter commented 9 years ago
Great!  Let us know how it goes.  Be sure to write some automated tests... you 
can do
things like save some static HTML for this purpose.

Original comment by the.good...@gmail.com on 22 Dec 2007 at 7:59

GoogleCodeExporter commented 9 years ago
adolfo.fitoria good luck loco  you need a lot 

Original comment by g3rr...@gmail.com on 22 Dec 2007 at 5:23

GoogleCodeExporter commented 9 years ago
I just need the documentation and the testing to finish it :D

Original comment by adolfo.fitoria on 23 Dec 2007 at 5:46

GoogleCodeExporter commented 9 years ago
sounds great

Original comment by istvan.a...@gmail.com on 23 Dec 2007 at 4:30

GoogleCodeExporter commented 9 years ago
Finished error, bug, or aditional stuff just tell me

Original comment by adolfo.fitoria on 24 Dec 2007 at 8:37

Attachments:

GoogleCodeExporter commented 9 years ago
Hello Adolfo,

Looks good. Using the feedparser module for reading out the RSS feeds is a great
idea. Turns out all that fuss about not having an API was just much ado about 
nothing.

Two improvements may be necessary. Your module is a single file so you can keep 
the
test file in the same folder as the main code (in any case never duplicate the 
main
code in the test folder. Just simply append to the path in the test script like 
so:

sys.path.append( '..' )

With that you can put the folder below in the import path if necessary. There 
is one
problem remaining, titles may contain characters that cannot be printed as ascii
without explicit decoding. Depending on the title you can get unicode errors 
like this: 

http://reddit.com/goto?rss=true&id=t3_22rg2
Traceback (most recent call last):
  File "testIt.py", line 10, in <module>
    print fed.title #prints the title of the Story
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2014' in position 
17:
ordinal not in range(128)

So you'll need to add ascii decoding to the strings that you print out. See 
this for
more info:

http://www.amk.ca/python/howto/unicode

Also you should never used tabs, make your tab insert 4 spaces instead (so 
called
soft tabs), that is the most desired indentation with python.

Pretty neat otherwise, good job.

Istvan

Original comment by istvan.a...@gmail.com on 25 Dec 2007 at 2:51

GoogleCodeExporter commented 9 years ago
one more thing (Titus mentioned this before), save a few xml files locally, 
then read
process and compare them in in your tests (use a testing framework like doctest 
or
unittest). Right now your test only grabs a few things from reddit and you can't
check that the code actually returns what you think it should.

Original comment by istvan.a...@gmail.com on 25 Dec 2007 at 3:00

GoogleCodeExporter commented 9 years ago
Thanks for the corrections Istvan. But I was not clear about encoding I did all 
I
could do in 30min because I must be traveling right now anyway I cant check this
after Jan 1st.

I wish the task could be complete now

atte
Adolfo Fitoria

Original comment by adolfo.fitoria on 25 Dec 2007 at 6:38

Attachments:

GoogleCodeExporter commented 9 years ago
Hello Adolfo,

This puts me in bit a difficult position. You did a good job, 
but it is not quite finished. I do understand that you may have
certain priorities that you must attend to, nonetheless accepting 
your work "as is" would make it unfair relative to those who 
are repeatedly asked to improve their code before it is
accepted.

It would be a shame to have to reopen the task as you got pretty 
far. But especially in the light that the task itself is not 
that difficult I feel we need to have it completed 
with a higher level of quality before accepting it as done.

I hope you can find some time to refine your code before the 
task expires on Dec 27th. 

best regards,

Istvan

Original comment by istvan.a...@gmail.com on 25 Dec 2007 at 7:27

GoogleCodeExporter commented 9 years ago
Istvan, Adolfo -- I think it's acceptable to give an extension until after Jan 
1st
for this, since it's the holidays and people will be travelling.  Let's just 
leave
this until then, when Adolfo can respond.

Original comment by the.good...@gmail.com on 25 Dec 2007 at 7:37

GoogleCodeExporter commented 9 years ago
I thought Adolfo said that he cannot do it even after January 1st:

>> I must be traveling right now anyway I cant check this after Jan 1st.

But it could be a typo.

In that case I'm fine with granting as much of an extension as it needs. 

Original comment by istvan.a...@gmail.com on 25 Dec 2007 at 10:39

GoogleCodeExporter commented 9 years ago
Extended due date until January 2nd, 2008

Original comment by istvan.a...@gmail.com on 27 Dec 2007 at 3:33

GoogleCodeExporter commented 9 years ago
:D

Original comment by g3rr...@gmail.com on 28 Dec 2007 at 12:08

GoogleCodeExporter commented 9 years ago
Thanks for the extention time Im not at home but I will try to complete it as 
soon 
as possible :S

Original comment by adolfo.fitoria on 30 Dec 2007 at 11:31

GoogleCodeExporter commented 9 years ago
Any progress?

Original comment by georg.br...@gmail.com on 6 Jan 2008 at 9:56

GoogleCodeExporter commented 9 years ago

Original comment by the.good...@gmail.com on 8 Jan 2008 at 8:11

GoogleCodeExporter commented 9 years ago
I claim this task.

Original comment by cody.som...@gmail.com on 8 Jan 2008 at 2:07

GoogleCodeExporter commented 9 years ago
This task is due January 13, 2008 14:10:00 UTC

Original comment by doug.hel...@gmail.com on 8 Jan 2008 at 10:23

GoogleCodeExporter commented 9 years ago
I'm having some problems with my unittest. I get the following error: 

cody-somerville@veracity:~/projects/python/reddit_reader/tests$ ./test.py
Commencing Unit Tests for 'Reddit'...
Traceback (most recent call last):
  File "./test.py", line 27, in <module>
    test_main()
  File "./test.py", line 23, in test_main
    RedditTest()
  File "/usr/lib/python2.5/unittest.py", line 209, in __init__
    (self.__class__, methodName)
ValueError: no such test method in <class '__main__.RedditTest'>: runTest

I've attached my entire package. please review :)

Original comment by cody.som...@gmail.com on 10 Jan 2008 at 4:46

Attachments:

GoogleCodeExporter commented 9 years ago
just put

if __name__ == "__main__":
    unittest.main()

at the end of test.py, instead of the other stuff.  unittest expects to collect 
and
run the tests itself, because IT'S MAGIC <badong>.  (can you tell I hate 
unittest? ;)

Original comment by the.good...@gmail.com on 10 Jan 2008 at 7:57

GoogleCodeExporter commented 9 years ago
Ok, I have made that modification but now my unit test that looks for the 
BozoFeed
exception to be raised fails when it is raised:

======================================================================
ERROR: test_badCommunity (__main__.RedditTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test.py", line 13, in test_badCommunity
    self.assertRaises(reddit.BozoFeed, reddit.grabCommunityNew("moo"))
  File "../reddit.py", line 75, in grabCommunityNew
    return _returnFeed("http://" + community.lower() + ".reddit.com/new.rss")
  File "../reddit.py", line 112, in _returnFeed
    raise BozoFeed, feed["bozo_exception"]
BozoFeed: <unprintable BozoFeed object>

----------------------------------------------------------------------
Ran 2 tests in 1.916s

    def test_badCommunity(self):
        self.assertRaises(reddit.BozoFeed, reddit.grabCommunityNew("moo"))
        self.assertRaises(reddit.BozoFeed, reddit.grabCommunityTop("moo"))
        self.assertRaises(reddit.BozoFeed, reddit.grabCommunityHot("moo"))
        self.assertRaises(reddit.BozoFeed, reddit.grabCommunityControversy(
                                                            "moo"))

Original comment by cody.som...@gmail.com on 12 Jan 2008 at 6:57

GoogleCodeExporter commented 9 years ago

Original comment by georg.br...@gmail.com on 13 Jan 2008 at 2:36

GoogleCodeExporter commented 9 years ago

Original comment by cody.som...@gmail.com on 14 Jan 2008 at 12:08

Attachments:

GoogleCodeExporter commented 9 years ago
I developed a small GUI application to make use of the new reddit module that 
I've
included in the tarball I've attached to this comment.

Original comment by cody.som...@gmail.com on 15 Jan 2008 at 2:58

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by doug.hel...@gmail.com on 16 Jan 2008 at 7:28

GoogleCodeExporter commented 9 years ago
Very nice!

The tests are a little skimpy, but I'm going to mark this completed anyway.  
The GUI
is a nice touch.

Original comment by doug.hel...@gmail.com on 17 Jan 2008 at 1:04

GoogleCodeExporter commented 9 years ago
Do you have any suggestions for improving the tests?

Original comment by cody.som...@gmail.com on 17 Jan 2008 at 1:33

GoogleCodeExporter commented 9 years ago
test_grab() will only give you useful information if there is ever an error
downloading the feed.  Otherwise, reddit.grabNew() might as well be a no-op.  
Usually
in tests you want to verify that the correct data, or at least correct *type* of
data, was returned from a call.

You could verify that the feeds have content by checking the number of entries, 
for
example.  You could also check that for each feed the proper title is included.

If you make local copies of the feeds, you could test that you are loading the 
right
entries for the a given call.  To do that, you will need a way to set a "base 
URL"
for the API, which might lead you to turn what you have into a class with 
methods
(where the base URL is an argument to __init__).

Original comment by doug.hel...@gmail.com on 17 Jan 2008 at 1:11

GoogleCodeExporter commented 9 years ago
Hi Doug,

 Sorry for taking so long to respond. 

 My question would be: Am I testing feedparser or reddit?

 With that aside, checking to ensure it has the correct keys and what not is an
excellent idea. I'll upload with that change later today.

Original comment by cody.som...@gmail.com on 24 Jan 2008 at 6:57

GoogleCodeExporter commented 9 years ago
Cody,

I see your point.  Consider the case where reddit changes their URLs.  In that 
case,
when your API uses feedparser to access the data, it may receive an error page, 
which
is treated as an empty feed (since it cannot be parsed).  In that case, your 
tests
would still fail, even though the API is no longer working.

Original comment by doug.hel...@gmail.com on 24 Jan 2008 at 9:01

GoogleCodeExporter commented 9 years ago
That is what the other unit test is doing. :)

Original comment by cody.som...@gmail.com on 24 Jan 2008 at 9:03