GiveToken / GiftBox

Repository for Sizzle
0 stars 0 forks source link

Follow website link on scrape to get social media links #1101

Open wogsland opened 8 years ago

wogsland commented 8 years ago

From LinkedIn #810 or Glassdoor #1100.

shreydesai commented 8 years ago

Current text retrieval algorithm (feel free to make suggestions):

  1. Get all links (tag a) on a web page
  2. If there is a href attribute in the link, grab it
  3. If a word in ['facebook', 'twitter', 'linkedin', 'google plus'] appears in the href attribute, then filter the list down to only THOSE href tags
  4. Based on what type of link it is, i.e. (facebook.com or twitter.com), parse the username from the link and output it
  5. Output would be a JSON file like this (None if nothing was found):
{
  "Facebook": "gosizzle",
  "Twitter": "go_sizzle",
  "LinkedIn": "None",
  "Google Plus": "None"
}