RSV3 / redfly

contact intelligence for Redstar
1 stars 0 forks source link

scrape linkedin #297

Open justinTNT opened 11 years ago

justinTNT commented 11 years ago

there's a lot of good data on linkedin, but only a small subset is available thru the api. once users have given us the ID of their connections, we should totally go thru and scrape those who are already connections. this would be a worker. steady, steady: don't wanna get throttlebanned.

justinTNT commented 11 years ago

ref #267

justinTNT commented 11 years ago

I wanted to use YQL, but it's disallowed by LI. They really don't want scrapers. on the upside, I found how to get the high res picture url. so now all we're missing, that scraping could give, is 'skills'. and those few connections with twisted priv. settings.

justinTNT commented 11 years ago

OK let's raise this a level: seems FC isnt giving us as much as we hoped, and we've hit the limit with the API, so let's look again at some gentle scraping.

kwantopia commented 11 years ago

Yea, I am looking into node.io to scrape. Btw, why do you say we have hit the limit with FC API? We have 5000 cap and we are still at 200.

-kwan

On Thu, May 9, 2013 at 8:09 PM, justinTNT notifications@github.com wrote:

OK let's raise this a level: seems FC isnt giving us as much as we hoped, and we've hit the limit with the API, so let's look again at some gentle scraping.

— Reply to this email directly or view it on GitHubhttps://github.com/RSV3/redfly/issues/297#issuecomment-17697468 .

Kwan Hong Lee, Ph.D. Technology Director Redstar Ventures http://www.redstar.com 617-871-0710

justinTNT commented 11 years ago

yeah I just mean we're getting as much value as we can from it. which is very little: just because it reports data doesn't mean we can use the API, which is restricted to connections, which we had anyway ...