CottageLabs / idfind

An identifier identifier
1 stars 0 forks source link

Enable twitter interface #1

Closed markmacgillivray closed 12 years ago

markmacgillivray commented 12 years ago

Should be able to send IDs via twitter to the ID helper, and get back a suitable response.

markmacgillivray commented 12 years ago

But what should the response be? I suggest it should start with the type of identifier, e.g. DOI, then should follow with a link. The link should go I guess to our result page where they can get more info, but we could also include the direct link to the resource where it is available, perhaps something like this:

REQUEST: @idhelp 10.1234/567890

RESPONSE: @username DOI #moreinfo http://idhelp.org/blah #direct http://dx.doi.org/blah

But this is up for debate. More suggestions welcome!

emanuil-tolev commented 12 years ago

A further note: better handling of the REQUEST might be good at some point. Assuming that all we'll get in the REQUEST is "@idhelp unknown_identifier_value" is fine at the first iteration, but would we get anything else in the request? "@idhelp unknown_identifier_value #research_conversation @another_interested_user" might be an example of how people could try to peruse the twitter interface.

There are essentially three options:

  1. Make no effort whatsoever to parse. Ignore @idhelp, expect one space character afterwards, and treat everything else as an identifier. This might be the best option, seeing how some identifiers might include spaces and the @/# special twitter characters!
  2. Make a small effort to parse. Ignore @idhelp, copy any @'s and #'s (and any other clearly defined strings) over to the response.
  3. Make an intelligent effort to parse. Ignore @idhelp and .. well, "cleverly" parse the rest of the string. The level of "clever" can be arbitrary, ranging from option 2. to an AI module. This is obviously not very optimal in terms of performance though - I'd only ever start working on this without any doubts if it would help implement some sort of a killer feature which would (e.g.) double the ease-of-use or usefulness of the service.

I was going to say go for 2. in a later iteration of the code, but I think parsing (with the normal Python tools) would just be too hard and nondeterministic. Richard did suggest option 1. at devXS, so I'm implementing that and no more for now.

As for RESPONSE: good idea, I think it will fit 140 characters. One thing to be careful about though - those identifier names are arbitrary and user-submitted. Obviously limiting what the users can submit because of the Twitter interface is not an option, so I just note that I may have to truncate the identifier name in the reply. (And use something like bit.ly to shorten the links). I'll go for your suggestion Mark (only include #direct if available ofc.).

emanuil-tolev commented 12 years ago

Okay, I've been reading up on this rather heavily. If we use the RESTful Twitter API, we're going to have to regularly poll the server (which is OK I suppose) - responses will be slightly delayed. Currently the lib (python-twitter) in use caches the requests for 1 minute, but I have to check whether we're going to exceed Twitter's daily limits (don't think so).

Apparently Twitter made the Streaming API for clients + applications like ours (indefinite connections and quick delivery of new messages, order not guaranteed). Unfortunately, that requires a bit more effort to use (need to handle error states + handle disconnections and "backfill" [with the RESTful API]). I've concentrated on getting it working consistently and correctly for now - I'll finish it up and commit it as soon as I understand why Identificator is not identifying successfully.

emanuil-tolev commented 12 years ago

Just turned tweetlisten.py into a constant listener. It works well as far as I can tell - just need to add implementation that remembers and uses the Twitter API since_id param (so as to not try to send duplicate replies) and some error handling (at least for Twitter's "duplicate status" error), and I will merge it with master.

Also, a note: it uses very much the same code as in web.py to try to identify the identifier, but the actions it takes are different from web.py (e.g. no redirect(), obviously). I'd like to remove the duplication of code, but am not exactly sure about the best approach - factor it out to one place, perhaps? (E.g. to identifier.py - so that all the "retrieve from ES, if not, try regular expression tests, if not then fail...." is encapsulated in one method that just takes an identifier and returns a response, handling all that inside.)

How do we usually handle error logging anyway? To file, to ES, not at all (i.e. just output to stdout)?

emanuil-tolev commented 12 years ago

Should be in working order now. Just a few things to fix and will merge into master.

One notable thing is enhancing the replies we give to users, so: @user_who_mentioned_us this is a identifier_name; url_prefix + id + url_suffix; read more server_url/identify/id

Twitter apparently does URL shortening of anything you post to it ( https://dev.twitter.com/discussions/1062 ). Will test when that enhancement ^ is in place.

Also notable is that the code in web.py and tweetlisten.py does essentially the same thing when trying to identify an identifier. It might be nice to centralise the logic in identifier.py::identify method (so that it tries the cache first, then tries engine indentification and finally fails to identify, i.e. "unknown id"). This will make future maintenance of this rather important code easier.

emanuil-tolev commented 12 years ago

remove bad half-second sleep in tweetlisten.py::save_lastid and get the static stuff out to config.json

as for the config (thanks Mark!): from whatid.config import config # or something like that

thing = config['thing']

emanuil-tolev commented 12 years ago

okay: "One notable thing is enhancing the replies we give to users, so: @user_who_mentioned_us this is a identifier_name; url_prefix + id + url_suffix; read more server_url/identify/id"

that's the last thing left, and I'm merging with master

emanuil-tolev commented 12 years ago

enhancements complete, emanuil-twitter branch merged with master

for further enhancements, just open an issue "enhancing twitter interface" instead of reopening this one

emanuil-tolev commented 12 years ago

URL shortening is automatically done by twitter by the way, just post the update (tweet) with the URL-s in it, and they will be converted to t.co-s (and won't count as much towards the char count, presumably).