pygooglevoice doesn't handle Google Voice's XML pagination

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Get an instance of a folder containing more than 10 messages
2. Compare folder.totalsize with len(folder.messages)

What is the expected output? What do you see instead?
len(folder.messages) should equal folder.totalsize.
Instead len(folder.messages) will never be more than 10.

What version of the product are you using? On what operating system?
0.4

Please provide any additional information below.

Google Voice only seems to return 10 messages on each of the XML pages,
with a pagination variable used to return the next 10, etc.

This is mentioned briefly at
http://posttopic.com/topic/google-voice-add-on-development

For the 'all' folder, for example, it looks like you would need to grab &
parse each page something like this:

https://www.google.com/voice/inbox/recent/all/
https://www.google.com/voice/inbox/recent/all/?page=p2
https://www.google.com/voice/inbox/recent/all/?page=p3
etc, etc.

This is one that I'm not entirely certain how to fix myself, so I'm just
reporting it here. I will perhaps poke at it a bit and submit a patch if I
come up with anything.

Original issue reported on code.google.com by smcgrat...@gmail.com on 29 Nov 2009 at 5:15

GoogleCodeExporter commented 9 years ago

On further review, I can sort of see what needs to be done, as far as calling
__do_xml_page once for every 10 messages and somehow concatenating the data 
from each
page.
I definitely can't make enough sense of the code to implement that though.

Original comment by smcgrat...@gmail.com on 29 Nov 2009 at 6:06

GoogleCodeExporter commented 9 years ago

You should be able to set the page attr in the data passed into __do_xml_page. 
like
so: voice.__do_xml_page('all', {'page':'p3'}). Give that a try, i do not have 
enough
messages to test pagination, but passing the var as a POST or GET param should 
help.

Original comment by justquick on 29 Nov 2009 at 7:56

Changed state: Started

GoogleCodeExporter commented 9 years ago

I'd rate this as high priority, since after a while, the program will no longer
retrieve new messages, just the same old ones.  Any progress on this?

Testing this requires more than 10 conversations, not more than 10 messages.  I 
have
one conversation with 22 messages, and they all show up in the inbox XML if any 
of
them do. 

If you have old conversations in your Google Voice trash folder, you can 
undelete
them, which puts them back in the inbox and can force the inbox to multiple 
pages.
So there's a way to test.

Original comment by na...@animats.com on 12 Jan 2010 at 6:45

GoogleCodeExporter commented 9 years ago

>> You should be able to set the page attr in the data passed into 
__do_xml_page. like
>> so: voice.__do_xml_page('all', {'page':'p3'}).

1.  There is no "__do_xml_page".  There is an "__get_xml_page".

2.  The "data" parameter to "__get_xml_page", when set to "{'page': 'p1'}", 
results
in an exception in XMLParser, even though there is a valid page p1.

Original comment by na...@animats.com on 14 Jan 2010 at 7:38

GoogleCodeExporter commented 9 years ago

The problem with "2." above is that "__do_page", if given a "data" parameter, 
goes a
POST instead of a GET, sending the "data" info in the headers, not the URL.  
Except
for page types "DOWNLOAD" and "XML SEARCH", which are always a GET. 

All that un-commented cutesy stuff with attributes makes this hard to fix with 
small
changes.  There are too many implicit assumptions about how Google Voice will 
behave
nailed into the code. When something needs an extra parameter, but it was 
implemented
as an attribute, the design of pygooglevoice breaks down.

Original comment by na...@animats.com on 14 Jan 2010 at 4:32

GoogleCodeExporter commented 9 years ago

Specifying "page" as above generates this URL:

DEBUG:PyGoogleVoice:/voice/inbox/recent/inbox/?_rnr_se=5BilkW7VQpUi5EDSCHmk%2Flb
Y2mc%3D&page=p2
- {'User-Agent': 'PyGoogleVoice/0.5
}

So the "_rnr_se" parameter is added only if "page" is specified.  Google Voice
returns a "403 Forbidden" error in this situation.  

Just requesting "https://www.google.com/voice/inbox/recent/inbox/?page=p2" with
Firefox works fine. Unclear why.

Original comment by na...@animats.com on 14 Jan 2010 at 5:20

GoogleCodeExporter commented 9 years ago

I have multiple page fetch working now.  "__do_page" in "voice.py" needs some 
work.
The "debug message" above is misleading; the debug print is inserting a "?", 
but the
actual URL generated doesn't have it; the actual URL sent out looked like
"...inbox/_rn_se", without the "?".  Some other pygooglevoice functions may be 
broken
because of that.  Do "DOWNLOAD" and "XML_SEARCH" work?  I suspect not.

Once "__do_page" has been fixed to do a GET with a properly constructed URL in
this situation, we can fetch pages > 1.  Code for this looks like

def fetchfolderpage(voice, pagetype, pagenumber=1) :    # fetch page N (starting 
from 1)
of inbox
    params = None                                       # params for fetching page, if any
    if pagenumber > 1 :                                 # if not first page, must put page number in URL
        params = {'page' : "p" + str(pagenumber)}       # get page "p2", etc.
        ####print("Page: " + repr(params))              # ***TEMP***
    xmlparser = voice._Voice__get_xml_page(pagetype, params)    # violate class privacy per
developer instructions
    return (xmlparser)                                  # return XML parser object  

This is painful and ugly.        

Fetching multiple pages properly requires reading the HTML, and looking for the 
"next
page" link to see if there's more to read.  I use BeautifulSoup for that,
but pygooglevoice doesn't normally parse the HTML, so that's a problem. If 
there's
an "a" tag with an "id" attribute with a value of "gc-inbox-next", there are 
more
pages to read.  In BeautifulSoup notation: 

   moreitem = tree.find("a",attrs = {"id" : "gc-inbox-next"})

If "moreitem" is not null, there are more pages to be read.

I'll do more cleanup on this.  It's definitely fixable, but it doesn't fit well 
into
the structure of pygooglevoice.

Original comment by na...@animats.com on 14 Jan 2010 at 8:01

GoogleCodeExporter commented 9 years ago

I think I see how to do this:

1.  The API needs some changes.  I propose to
    give XMLParser an optional "pagenumber" parameter
    which is then used in its lambda to get the desired page.  So the user
    can write "voice.inbox(pagenumber=2)" to get page 2 of the inbox.
    Default is 1, this being the Google Voice convention. This maintains
    compatibility with existing user code.

    There's no good way to detect the last page without looking at the HTML,
    other than getting an exception on a bad page number.  See my previous
    note.  So detecting the last page is the caller's responsibility for now.

2.  All the "helper" functions in "voice.py" need, instead of one "data" 
parameter,
    a "urldata" and a "postdata" parameter.  "url" data gets urlencoded and
    appended to the URL; "postdata" gets sent as part of a POST.  If "postdata"
    is not None, an HTTP POST will be performed, otherwise a GET.
    All the callers of these functions need to be modified.  The current hack
    in __do_page, "if page in ('DOWNLOAD','XML_SEARCH')", goes away, and the
    caller makes the GET/POST decision.

3   Current exception handling in __call__ of XMLParser turns
    all exceptions into ParsingError.  Because XMLParser's lambda does network
    I/O, this hides network errors.  You get "Parsing Error" when Google Voice
    gave you "403 Forbidden", for example.  Exception handling there should pass
    through HTTP and OS errors.  Then you can tell the difference between "network
    problem", "Google changed the API", and "pygooglevoice is broken".

The problem is that this requires many small changes all over pygooglevoice, 
and I'm
not set up to test it properly other than for SMS.  How can we get this done?

Original comment by na...@animats.com on 15 Jan 2010 at 5:28

GoogleCodeExporter commented 9 years ago

Here's code for a workaround.  I do NOT recommend putting this directly into
"pygooglevoice", and it has NOT been tested for non-SMS functions.  But I've
succeessfully received three pages worth of SMS messages with this code.

Original comment by na...@animats.com on 15 Jan 2010 at 5:58

GoogleCodeExporter commented 9 years ago

Here's a patch override file for a workaround.  This doesn't affect the 
installed
"pygooglevoice", it just replaces some functions for your application.  This 
has NOT
been tested for non-SMS functions. This is NOT a permanent fix; I'll leave that 
to
the developer.  It will read multiple pages of inbox SMS, if you explicitly call
"fetchfolderpage" for each page.  

This applies to googlevoice 0.5 only.

Original comment by na...@animats.com on 15 Jan 2010 at 7:27

Attachments:

pygooglevoicepatches.py

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

Here's an alternative patch, in case anyone is interested. I have applied it to 
the current pygooglevoice source code. For my own rudimentary testing 
scenarios, it seems to work fine.

It allows an optional page-number parameter to be supplied to the 
voice.inbox(), voice.starred(), voice.sms(), voice.all(), voice.spam(), 
voice.voicemail(), and voice.trash() methods.

For example, after applying this patch, you can now do this to retrieve all of 
the SMS conversations on page 17:

  voice.sms(17)

Invoking "voice.sms()" will still work as it currently does, and it will 
retrieve the converations on page 1. The same is true for all the other changed 
methods.

This applies to pygooglevoice 0.5. For any other version, YMMV.

Original comment by hippo.ma...@gmail.com on 17 Nov 2010 at 1:22

Attachments:

pygv.patch

GoogleCodeExporter commented 9 years ago

PS: Here's a script I wrote which makes use of this patched version of 
pygooglevoice-0.5. It traverses the entire SMS folder within a given Google 
Voice instance and it builds a data structure which contains all SMS messages 
within all the conversations on all the pages.

It's very loosely based on the examples/parse_sms.py program.

Like that example, it requires the BeautifulSoup XML parsing library. I used 
this version:

  http://www.crummy.com/software/BeautifulSoup/download/3.x/BeautifulSoup-3.0.8.1.tar.gz

Original comment by hippo.ma...@gmail.com on 17 Nov 2010 at 2:15

Attachments:

getsms

GoogleCodeExporter commented 9 years ago

I've forked this project and incorporated the patch suggested in Comment 12.

http://code.google.com/r/fracai-pygooglevoice/

Original comment by fra...@gmail.com on 7 Apr 2011 at 2:15

GoogleCodeExporter commented 9 years ago

     Ah, the abandonware problem.

     That's not a great patch, just what I could do from the
outside without redesigning the code.

     I finally gave up on Google Voice and switched to Twilio as my
SMS gateway.  Twilio isn't free, but it does SMS much better.
A no-traffic poll of Google Voice transmits all that XML for at
least 10 messages, so the overhead and data usage during periods
of light traffic is very high.  And you have to jump through hoops
to eliminate duplicate messages.

                John Nagle

Original comment by na...@animats.com on 7 Apr 2011 at 4:11

GoogleCodeExporter commented 9 years ago

It's actually not using your patch (from #10), but the one from 12. It looked 
pretty clean to me.

Original comment by fra...@gmail.com on 7 Apr 2011 at 11:05

GoogleCodeExporter commented 9 years ago

I have been working with all this code. I have made little to no progress.

Here is what I have:

I receive SMS messages that contain geo-lcation data. I want to constantly down 
load sms messages to an updated csv that can feed to a kmz for google earth can 
anyone help?

XMAN

Original comment by Xavier.J...@gmail.com on 2 May 2011 at 2:47

GoogleCodeExporter commented 9 years ago

I made a branch for gathering all pages from a google voice search query:

http://code.google.com/r/eahutchins-searchpage/

This is useful for dumping full call logs, the search example script now dumps 
the date, time and number from each matching record.

Original comment by E.A.Hutc...@gmail.com on 8 Sep 2011 at 11:39

Git-Host / pygooglevoice

pygooglevoice doesn't handle Google Voice's XML pagination #22