emhart / ImpacTwit

A set of functions to parse a twitter search and track the cumulative potential viewers. Mainly intended for use with scientific publications.
3 stars 1 forks source link

Identify all the papers currently being discussed in my twitter stream #2

Open cboettig opened 12 years ago

cboettig commented 12 years ago

It would be great to pull my morning twitterstream into R (from homeTimeline()), grab all the links, and see which ones correspond to DOIs (ideally also identify links directly to journal websites). Then I could see what papers were being most discussed in my circles, and possibly what people were saying about them too.

I could take a whack at this if you like. Know any good way to expand shortened links or identify if they correspond to a pub? The totalimpact folks must have that figured out, so this should be pretty straight forward?

emhart commented 12 years ago

Sure you can give it a try. The problem I encountered is that most journals don't have a uniform link structure, or multiple links go to the same page. Some journals have DOI's in the links, some don't. The links actually come in already expanded so you don't even have to deal with that problem. Another issue is that people often discuss studies with links from popular news, so it's hard to track those without knowing them apriori. You'll also have to deal with OAuth, which you can do but there's no easy solution for anyone except yourself. I think you'd have to create a keyword database to search for to identify links. Scott and I met with Heather and talked about the AltMetrics guy, and they were saying that's what he does. He'll search for say "sciencedirect", and terms like that. But have a go, I'd be curious to see what you get.

cboettig commented 12 years ago

Cool.

ROAuth isn't so bad, at least for personal use. But I'm getting non-expanded links:

I authenticate, then run

me <- homeTimeline(n=500)
df <- twListToDF(me)
df$text[1:8]

and get all shortened links:

 [1] "Sensation in a Single Neuron Pair Represses Male Behavior in
Hermaphrodites http://t.co/9ZEXEZDt"

 [2] "Neuromodulatory State and Sex Specify Alternative Behaviors through
Antagonistic Synaptic Pathways in C. elegans http://t.co/LWp6M4sY"
 [3] "Thank you @ldignan for your brilliant article
http://t.co/diLL5FLFthat stirred some thoughts for @timgasper
http://t.co/5KCc3jO6"
 [4] "RT @casrai_ed: At #vivo12 learning from speaker that the community is
not sure why research ontologies graphically presented tend to loo ..."
 [5] "Excellent work: A GABAergic Inhibitory Neural Circuit Regulates
Visual Reversal Learning in Drosophila http://t.co/OL5MWUHB"

 [6] "RT @mattyglesias: CHART OF DOOM for Mitt Romney: http://t.co/VJaBy5Bw"

 [7] "RT @msanjayan: What major world city is most at risk from #flooding
(hint its not in Holland nor Bangladesh) http://t.co/86rlJA6I"
 [8] "\"NASA scientist James Hansen first warned the world about...global
warming.\" http://t.co/awT7X5Lq (He corrected me) http://t.co/76igPy6J"

Hints?

On Wed, Aug 22, 2012 at 12:15 PM, Edmund Hart notifications@github.comwrote:

Sure you can give it a try. The problem I encountered is that most journals don't have a uniform link structure, or multiple links go to the same page. Some journals have DOI's in the links, some don't. The links actually come in already expanded so you don't even have to deal with that problem. Another issue is that people often discuss studies with links from popular news, so it's hard to track those without knowing them apriori. You'll also have to deal with OAuth, which you can do but there's no easy solution for anyone except yourself. I think you'd have to create a keyword database to search for to identify links. Scott and I met with Heather and talked about the AltMetrics guy, and they were saying that's what he does. He'll search for say "sciencedirect", and terms like that. But have a go, I'd be curious to see what you get.

— Reply to this email directly or view it on GitHubhttps://github.com/emhart/ImpacTwit/issues/2#issuecomment-7945707.

Carl Boettiger UC Davis http://www.carlboettiger.info/

emhart commented 12 years ago

Sure, you could use getURL("http://t.co/9ZEXEZDt",followlocation = TRUE) and then scrape the resulting text for a doi, or you could use the longurl api.

myURL <- URLencode("http://t.co/awT7X5Lq",reserved=T) getURL(paste("http://api.longurl.org/v2/expand?url=",myURL,"&all-redirects=1",sep=""))

The problem I've had with that is that sometimes it fails to expand links that work in my browser, not sure why.