Git-Host / pygooglevoice

Automatically exported from code.google.com/p/pygooglevoice
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Cannot receive SMS messages #6

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
pygooglevoice is currently not providing an API for receiving SMS messages. I 
looked to see why, 
and quickly figured it out: Google is not providing any easy ways to access 
them.

However, they can be accessed, and I have been for the last few months 
successfully.

What I needed was something that would actually pop a notification and sound an 
alarm when 
someone texted me, like in an instant messenger. I can't check google.com/voice 
every 30 
seconds. It disrupts whatever I'm doing.

This is my first Python project, so bear with me if the code is a little odd or 
misshapen. :)

First, I've extended the original Voice object from pygooglevoice to create two 
new functions, 
called "get_sms_data (self, smsID)" and "set_read_status (self, smsID, 
readStatus)". smsID refers 
to a particular conversation "bubble" in google.com/voice.

get_sms_data calls Google's "inbox/recent/sms/" Javascript API, and passes the 
resulting raw 
HTML data to a new class I wrote titled GVoiceSMSProcessor. This class is 
called up as follows:

smsProcessor = GVoiceSMSProcessor(smsData, smsID)

Internally, GVoiceSMSProcessor organizes all the available data into a 
dictionary called 
conversations. Here is a sample dump of smsProcessor.conversations, with 
personal data 
removed, so you get an idea of how awesome this is:

[
{'contact': 'PersonA',
'messages':
[
{'text': 'Hey', 'sender': 'PersonA', 'time': '5:09 PM'},
{'text': "Yea, what's up?", 'sender': 'Me', 'time': '5:09 PM'},
{'text': "I'm at <place> if u want to come over here and hang out.", 'sender': 
'PersonA', 'time': 
'5:15 PM'},
]
}]

I could've easily had it process more data, but I included here only the data 
that was not 
retrieved in pygooglevoice's API.

The GVoiceSMSProcessor object is somewhat confusing to read through, but the 
idea is quite 
simple: process the raw data with BeautifulSoup, and then loop through the 
entire document, 
element by element. The code in "__searchHTML" looks for specific class 
patterns in the HTML. 
When it finds a tag that contains a specific piece of information, it logs what 
that data is we are 
so that farther down it can be processed into the dictionary. The 
'contact-next' state occurs 
when the contact name is "Me", because the tag that marks when the contact name 
is coming up 
is empty, and it's the next (classless) tag that has the good stuff. Thus, this 
search method also 
tracks a little on state.

If you look at the contents of https://www.google.com/voice/inbox/recent/sms/ 
and find each 
class, you'll see why the searching does what it does (hopefully).

I'm attaching the entire client I wrote for myself, so other users can try it. 
Linux users, "growl" 
ought to be changed to some other notification daemon. Also, mplayer is used to 
play a 
notification sound, which is specific to OSX.

In summary, the pygooglevoice developers should look to the gvoice.py to see 
what additions 
can be made to their code code.

Original issue reported on code.google.com by jacobgod...@gmail.com on 14 Sep 2009 at 10:46

GoogleCodeExporter commented 9 years ago
Ack. I thought I fixed a small bug from when I was testing, but I didn't. 
Here's what should've been uploaded in 
the first place.

Original comment by jacobgod...@gmail.com on 14 Sep 2009 at 11:22

Attachments:

GoogleCodeExporter commented 9 years ago
Horray! The first example of an html processor; ive been waiting for this. 
There is a
buncha data (and not just for sms) contained in the html payload of the gvoice
responses. This example is the tact i was myself trying to make in v0.4, but 
then got
really lazy and distracted. Instead of making a subclass of Voice, you should 
patch
the existing code with the new functionality. Clone the repo, make ur changes, 
then
run hg diff and submit the diff patch output. Also, please conform to the coding
standards (needs docstrings and NO TABS!). Lastly, I would prefer not to have 
to rely
on external modules (BeautifulSoup) for parsing the html response. Instead try 
using
some builtin parsers (xml.parsers or sgmllib). If its unavoidable, then we can 
use
beautiful soup. Thanks again!

Original comment by justquick on 15 Sep 2009 at 1:38

GoogleCodeExporter commented 9 years ago
You're welcome! I am more than willing to patch the source code, but I am 
confused about some of the 
methods it uses to make calling Google API's more efficient and the code less 
redundant. Is there a time and 
place we can meet to discuss how best to merge?

Also, BeautifulSoup is a requirement in my code because the XHTML returned by 
Google has some invalid 
syntax (iirc) and it's not true XML anyway. (The root elements are XML, but the 
contents are not, so it never 
gets fully processed by pure XML modules.) I've actually tried at least three 
different different XML processors, 
before landing on BeautifulSoup. I think we ought to stick with it, especially 
since Google Voice is essentially 
an HTML service right now, and not a straight XML one.

Original comment by jacobgod...@gmail.com on 15 Sep 2009 at 1:57

GoogleCodeExporter commented 9 years ago
This is why the sgmllib exists: http://effbot.org/librarybook/sgmllib.htm

Its really all the same biznes. Google voice is actually an XML service, but it
contains JSON and HTML payloads inside of it which is really screwy. I am 
justquick
on gchat and aim, hit me up and we'll discuss integration

Original comment by justquick on 15 Sep 2009 at 2:13

GoogleCodeExporter commented 9 years ago
A few points/questions:

1)  If you're parsing the xml from: 
https://www.google.com/voice/inbox/recent/sms/,
will this give you a complete list of sms messages, or only the most recent?

2) As your message queue increases, getting the few most recent sms messages, 
you
still have to parse through the entire xml.  This will be slow to parse if you 
are
monitoring this frequently (e.g. every minute)

3) Would it be better to use IMAP for reading/monitoring sms messages?  

Original comment by steve.r....@gmail.com on 22 Oct 2009 at 8:02

GoogleCodeExporter commented 9 years ago
1) Only the first page, but other pages can be retrieved if desired. I keep my 
inbox
clean, so I don't usually have to worry about this.

2) The parsing that occurs does take a few seconds' worth of 100% CPU time, 
since it
runs through every X(HT)ML field. It's really a stupid way to do it, but 
optimizing
this routine would only mean more work later when it breaks again. In other 
words:
yea, it's CPU-intensive, but it's not that long and the trade-off is 
flexibility to
remain up-to-date.

3) To my knowledge this is not implemented. They haven't even gotten to a proper
language-agnostic API (thus, this library), so we have to do special processing.

Original comment by jacobgod...@gmail.com on 22 Oct 2009 at 8:30

GoogleCodeExporter commented 9 years ago
I can't seem to run "check_for_new.py" It just returns, Error: 'Contact' 
Is there something I'm missing?

Original comment by jwilc...@gmail.com on 24 Oct 2009 at 2:51

GoogleCodeExporter commented 9 years ago
To keep support requests in this bug report to a minimum, please contact me at 
my ID
+ "@gmail.com". Resubmit your question and I'll see what I can do.

Original comment by jacobgod...@gmail.com on 24 Oct 2009 at 4:43

GoogleCodeExporter commented 9 years ago
It's straightforward enough to fetch all SMS messages.  Unambiguously detecting
unread messages in a conversation is the hard part.  You can't mark an 
individual
message as read, archive it, or delete it; only the 

Attached is some brief code to parse and print SMS messages via Google Voice.  
It's
not necessary to modify or override anything in googlevoice to do this.  The 
output
of "extractsms" is an array of dicts, one for each message, with Google's field
names.  This uses some of the features of BeautifulSoup to speed up the 
processing.

It's not clear whether Google will put up with a poll of this type several 
times a
minute.  A real API would need a cheap "has anything changed" query poll like 
RSS.

Original comment by na...@animats.com on 31 Oct 2009 at 5:09

GoogleCodeExporter commented 9 years ago
(Previous attachment didn't attach.)

Original comment by na...@animats.com on 31 Oct 2009 at 5:11

Attachments:

GoogleCodeExporter commented 9 years ago
I've written a poller which polls Google Voice and retrieves new SMS messages.  
New
messages are identified by hashing the conversation, time, and text of each 
message,
and discarding duplicates.  That seems to work, but I want to run it for a few 
days
and watch the behavior of Google Voice.

This has been used to fully SMS-enable a Model 15 Teletype, a cast-steel 
monster from
1930:

http://brassgoggles.co.uk/forum/index.php/topic,19810.0.html

Original comment by na...@animats.com on 31 Oct 2009 at 6:11

GoogleCodeExporter commented 9 years ago
any chance i can see the source code for the poller too?

Original comment by justquick on 2 Nov 2009 at 7:30

GoogleCodeExporter commented 9 years ago
Here's the poller, in a preliminary form.  This works, but it's not intended as
released software yet. It is a part of a larger program, but it doesn't depend 
on
anything but "smsio.py".  If you use the poller, send using the SMSsend in the
poller, not in "smsio.py".  Otherwise, sending logs out from Google Voice, which
interferes with the next poll.

I poll every 30 seconds, and Google seems to be OK with that.  Don't overdo it. 

Original comment by na...@animats.com on 3 Nov 2009 at 4:58

Attachments:

GoogleCodeExporter commented 9 years ago
The poller has some problems with old messages reappearing after 24 hours or 
so. 
Apparently Google changes some element of the data that affects the MD5 digest 
used
to eliminate previously-received messages.

Google gives you a time and date for the start of a conversation, but the 
messages
within that conversation only have a time, even if the conversation is more 
than 24
hours old.  So you get things like this:

11/1/09 8:01 PM 26 hours ago
(408) 482-1751: blahblah... 8:01 PM

Google's output is thus ambiguous as to the date of each message.  

It's also not clear when Google starts a new "conversation". I currently have 
two
conversations listed with the same phone number.  

Also, Google doesn't reassemble multi-part SMS messages from devices that can 
send them. 

Original comment by na...@animats.com on 3 Nov 2009 at 6:44

GoogleCodeExporter commented 9 years ago
Another problem is that every poll reads every SMS message ever received.  This 
is
because I'm reading "voice.sms_html".  Even reading the inbox still shows items 
not
in the inbox.  Reading "https://www.google.com/voice/inbox/recent" may be more
useful; hopefully Google truncates old stuff at some point.  This needs further 
testing.

"recent" should be added to FEEDS in "settings.py".  I've been patching with

#
#   Adds "recent_html" attribute to "Voice" objects,
#   so we can read recent messages
#
if not "recent" in googlevoice.settings.FEEDS :
    googlevoice.settings.FEEDS = tuple(list(googlevoice.settings.FEEDS) + ["recent"])

This enables reading "recent_html".  I'll run for a few days with that, 
accumulate
some SMS messages, and find out when Google starts scrolling old stuff out of 
"recent". 

Also, "Received" is spelled wrong throughout "settings.py".

Google's XML files contain an incredible amount of unneeded material.  Each 
poll is
reading about 100KB, of which about 3% is useful.  If you send feedback to 
Google,
ask for the ability to get new messages via RSS.  That's well-understood and 
polls
efficiently. 

Original comment by na...@animats.com on 3 Nov 2009 at 7:42

GoogleCodeExporter commented 9 years ago
When polling for new messages, you might want to read
https://www.google.com/voice/inbox/recent/unread
instead - this keeps the the amount of data down some more…

Original comment by andreas.amann@gmail.com on 8 Nov 2009 at 8:05

GoogleCodeExporter commented 9 years ago
The problem with "unread" is that it's on a per-conversation basis, not a 
per-message
basis.  You can mark a conversation as "read", but there's a timing window 
between
reading, deciding to mark, a new message coming in for an existing 
conversation, and
marking.  You might lose a message.  Especially if someone is sending a 
multi-part
SMS message, where the sections come in one after another.  

I've heard a rumor that a Google Voice API is coming within two weeks.  

Original comment by na...@animats.com on 11 Nov 2009 at 6:20

GoogleCodeExporter commented 9 years ago
I doubt that Google will make an official API because as it stands now a move 
like
that would be construed as a violation of the terms of service inherited from 
Grand
Central. But if it actually did, that would b really sweet.

Read more in this article: 
http://serotoninstorm.com/2009/sep/16/hacking-away-at-voice/

I have just added an example of parsing sms messages into the repository

http://code.google.com/p/pygooglevoice/source/browse/examples/parse_sms.py

Thanks John

Original comment by justquick on 11 Nov 2009 at 9:20

GoogleCodeExporter commented 9 years ago
There's an easily fixable inefficiency.  If you read "voice.sms()" (which read 
the
JSON part of the document) and "voice.sms_html()", (which reads the HTML part)
pygooglevoice queries Google Voice twice for the same document.  That doubles 
the
already-excessive network traffic.  Also, there's the possibility that the two 
may
then be o

I'd suggest that the Folder object returned by "voice.sms()" be given an "html"
attribute, to access the HTML already retrieved.  This applies to all FEED 
objects.  

Original comment by na...@animats.com on 18 Nov 2009 at 6:15

GoogleCodeExporter commented 9 years ago
Don't know if it has been fixed but I noticed that with the original version of 
jacobgod...@gmail.com's file. Originally the program would not register one 
message conversations and would not count the newest message. The program 
updated when it hit an html tag that was before the message data, this meant 
that if there was only one message the program wouldn't update and would give 
an "Error: 'message'" error. I switched the original trigger tag to one found 
at the end of the message and it seems to have fixed the problem. I have been 
modifying the file in other ways so don't know if other aspects of the program 
will be broken, use at your own risk.

If you find something that is broken feel free to send me an email, more than 
likely it is an out of place hash mark. :P

Original comment by droat...@gmail.com on 5 Feb 2012 at 3:16

Attachments:

GoogleCodeExporter commented 9 years ago
Hey thank you!

Original comment by JordanNi...@gmail.com on 18 Dec 2013 at 8:08