Closed markmacgillivray closed 12 years ago
But what should the response be? I suggest it should start with the type of identifier, e.g. DOI, then should follow with a link. The link should go I guess to our result page where they can get more info, but we could also include the direct link to the resource where it is available, perhaps something like this:
REQUEST: @idhelp 10.1234/567890
RESPONSE: @username DOI #moreinfo http://idhelp.org/blah #direct http://dx.doi.org/blah
But this is up for debate. More suggestions welcome!
A further note: better handling of the REQUEST might be good at some point. Assuming that all we'll get in the REQUEST is "@idhelp unknown_identifier_value" is fine at the first iteration, but would we get anything else in the request? "@idhelp unknown_identifier_value #research_conversation @another_interested_user" might be an example of how people could try to peruse the twitter interface.
There are essentially three options:
I was going to say go for 2. in a later iteration of the code, but I think parsing (with the normal Python tools) would just be too hard and nondeterministic. Richard did suggest option 1. at devXS, so I'm implementing that and no more for now.
As for RESPONSE: good idea, I think it will fit 140 characters. One thing to be careful about though - those identifier names are arbitrary and user-submitted. Obviously limiting what the users can submit because of the Twitter interface is not an option, so I just note that I may have to truncate the identifier name in the reply. (And use something like bit.ly to shorten the links). I'll go for your suggestion Mark (only include #direct if available ofc.).
Okay, I've been reading up on this rather heavily. If we use the RESTful Twitter API, we're going to have to regularly poll the server (which is OK I suppose) - responses will be slightly delayed. Currently the lib (python-twitter) in use caches the requests for 1 minute, but I have to check whether we're going to exceed Twitter's daily limits (don't think so).
Apparently Twitter made the Streaming API for clients + applications like ours (indefinite connections and quick delivery of new messages, order not guaranteed). Unfortunately, that requires a bit more effort to use (need to handle error states + handle disconnections and "backfill" [with the RESTful API]). I've concentrated on getting it working consistently and correctly for now - I'll finish it up and commit it as soon as I understand why Identificator is not identifying successfully.
Just turned tweetlisten.py into a constant listener. It works well as far as I can tell - just need to add implementation that remembers and uses the Twitter API since_id param (so as to not try to send duplicate replies) and some error handling (at least for Twitter's "duplicate status" error), and I will merge it with master.
Also, a note: it uses very much the same code as in web.py to try to identify the identifier, but the actions it takes are different from web.py (e.g. no redirect(), obviously). I'd like to remove the duplication of code, but am not exactly sure about the best approach - factor it out to one place, perhaps? (E.g. to identifier.py - so that all the "retrieve from ES, if not, try regular expression tests, if not then fail...." is encapsulated in one method that just takes an identifier and returns a response, handling all that inside.)
How do we usually handle error logging anyway? To file, to ES, not at all (i.e. just output to stdout)?
Should be in working order now. Just a few things to fix and will merge into master.
One notable thing is enhancing the replies we give to users, so: @user_who_mentioned_us this is a identifier_name; url_prefix + id + url_suffix; read more server_url/identify/id
Twitter apparently does URL shortening of anything you post to it ( https://dev.twitter.com/discussions/1062 ). Will test when that enhancement ^ is in place.
Also notable is that the code in web.py and tweetlisten.py does essentially the same thing when trying to identify an identifier. It might be nice to centralise the logic in identifier.py::identify method (so that it tries the cache first, then tries engine indentification and finally fails to identify, i.e. "unknown id"). This will make future maintenance of this rather important code easier.
remove bad half-second sleep in tweetlisten.py::save_lastid and get the static stuff out to config.json
as for the config (thanks Mark!): from whatid.config import config # or something like that
thing = config['thing']
okay: "One notable thing is enhancing the replies we give to users, so: @user_who_mentioned_us this is a identifier_name; url_prefix + id + url_suffix; read more server_url/identify/id"
that's the last thing left, and I'm merging with master
enhancements complete, emanuil-twitter branch merged with master
for further enhancements, just open an issue "enhancing twitter interface" instead of reopening this one
URL shortening is automatically done by twitter by the way, just post the update (tweet) with the URL-s in it, and they will be converted to t.co-s (and won't count as much towards the char count, presumably).
Should be able to send IDs via twitter to the ID helper, and get back a suitable response.