ThinkUpLLC / ThinkUp

ThinkUp gives you insights into your social networking activity on Twitter, Facebook, Instagram, and beyond.
http://thinkup.com
GNU General Public License v3.0
3.3k stars 676 forks source link

posts.post_text blank because facebook crawler only considers post->message #1480

Open leonatkinson opened 11 years ago

leonatkinson commented 11 years ago

It looks like issue #877 updated logic so that if a FB post had no "message" property, it used "name" instead. Looking at what's in the code now, though, that change has reverted. I'm looking at line 290 in class.FacebookCrawler.php:

$post_to_process = array(
  "post_id"=>$post_id,
  "author_username"=>$profile->username,
  "author_fullname"=>$profile->username,
  "author_avatar"=>$profile->avatar,
  "author_user_id"=>$p->from->id,
  "post_text"=>isset($p->message)?$p->message:'',
  "pub_date"=>$p->created_time,
  "favlike_count_cache"=>$likes_count,
  "in_reply_to_user_id"=> isset($p->to->data[0]->id) ? $p->to->data[0]->id : '', // assume only one recipient
  "in_reply_to_post_id"=>'',
  "source"=>'',
  'network'=>$network,
  'is_protected'=>$is_protected,
  'location'=>$profile->location
);

I have many posts created by dlvr.it that seem to match the pattern: no message property but name and description properties. Here's what the JSON for the post looks like:

{
  "id": "658936572_10151226791686573", 
  "from": {
    "name": "Leon Atkinson", 
    "id": "658936572"
  }, 
  "link": "http://www.leonatkinson.com/microsofts-modern-ie/", 
  "name": "Microsoft’s modern.IE", 
  "caption": "www.leonatkinson.com", 
  "description": "MS released a site dedicated to testing old versions of MSIE. In addition to a tester you can use right in the site, they offer up virtual machines for VirtualBox that cover all versions of MSIE, including MSIE 10. Home | Testing made easier in Internet Explorer | modern.IE modern.IE is a dev...", 
  "icon": "https://fbcdn-photos-a.akamaihd.net/photos-ak-snc7/v43/44/232775914688/app_2_232775914688_6381.gif", 
  "actions": [
    {
      "name": "Comment", 
      "link": "https://www.facebook.com/658936572/posts/10151226791686573"
    }, 
    {
      "name": "Like", 
      "link": "https://www.facebook.com/658936572/posts/10151226791686573"
    }
  ], 
  "privacy": {
    "description": "Networks, Friends", 
    "value": "CUSTOM", 
    "friends": "ALL_FRIENDS", 
    "networks": "1", 
    "allow": "", 
    "deny": ""
  }, 
  "type": "link", 
  "status_type": "app_created_story", 
  "application": {
    "name": "dlvr.it", 
    "namespace": "dlvr_it", 
    "id": "232775914688"
  }, 
  "created_time": "2013-02-02T16:45:09+0000", 
  "updated_time": "2013-02-02T16:45:09+0000", 
  "comments": {
    "count": 0
  }
}

These posts with no text represent about 39% of my FB posts. Having name or description in there for the post_text will make the reports more readable. Thanks!

leonatkinson commented 11 years ago

Perhaps the code should change to use

"post_text"=>self::getBestPostText($p),

...

private static function getBestPostText($post) {
    $post_text = '';
    if(isset($post->message)) {
        $post_text = $post->message;
    }
    if(!$post_text && isset($post->name)) {
        $post_text = $post->name;
    }
    if(!$post_text && isset($post->description)) {
        $post_text = $post->description;
    }
    if(!$post_text && isset($post->story)) {
        $post_text = $post->story;
    }
    return $post_text;
}