dfreelon / fb_scrape_public

Scrapes posts and comments from public Facebook pages.
BSD 3-Clause "New" or "Revised" License
106 stars 52 forks source link

fails to collect authors from comments #4

Closed arianpasquali closed 6 years ago

arianpasquali commented 6 years ago

hey. nice piece of script. facebook data collection is a hassle and api keeps changing all the time.

Unfortunately it fails now to collect authors from comments. I imagine it was working before. It works fine for posts.

If you try making the same request using Facebook Graph Explorer it works perfectly (using versions <= 2.10). It seems to be related with access token permission, but I couldn't find a way fix it properly yet.

Even if you inform the fields you want Facebook doesn't send you the "from" field wich carries authors info and the script breaks.

To make it work you have to ignore authors info deleting lines 94 and 95.

  File "fb_scrapper.py", line 15, in <module>
    comments = fsp.scrape_fb(app_id,access_token,"6815841748_10155401589581749",scrape_mode="comments")
  File "./fb_scrape_public/fb_scrape_public.py", line 202, in scrape_fb
    csv_data = make_csv_chunk(next_item,scrape_mode,msg_user,msg_content)
  File "./fb_scrape_public/fb_scrape_public.py", line 107, in make_csv_chunk
    csv_line = [line['from']['name'], '_' + \
KeyError: 'from'

I managed to make it respond correctly by using the access token generated by Graph Explorer, but it is not the best solution in this case.

I wonder if you have faced this problem already.

arianpasquali commented 6 years ago

I checked all possible permissions available for my token and I compared it with the token generated by Facebook Graph Explorer.

debuging the request I have this perms using Facebook Graph Explorer:

==== Access Token Info
  {
    "perms": [
      "user_birthday",
      "user_religion_politics",
      "user_relationships",
      "user_relationship_details",
      "user_hometown",
      "user_location",
      "user_likes",
      "user_education_history",
      "user_work_history",
      "user_website",
      "user_events",
      "user_photos",
      "user_videos",
      "user_friends",
      "user_about_me",
      "user_status",
      "user_games_activity",
      "user_tagged_places",
      "user_posts",
      "rsvp_event",
      "email",
      "read_insights",
      "publish_actions",
      "read_audience_network_insights",
      "read_custom_friendlists",
      "user_actions.books",
      "user_actions.music",
      "user_actions.video",
      "user_actions.news",
      "user_actions.fitness",
      "user_managed_groups",
      "manage_pages",
      "pages_manage_cta",
      "pages_manage_instant_articles",
      "pages_show_list",
      "publish_pages",
      "read_page_mailboxes",
      "ads_management",
      "ads_read",
      "business_management",
      "pages_messaging",
      "pages_messaging_phone_number",
      "pages_messaging_subscriptions",
      "pages_messaging_payments",
      "instagram_basic",
      "instagram_manage_comments",
      "instagram_manage_insights",
      "public_profile",
      "basic_info"
    ],
    "user_id": 635966130,
    "app_id": 145634995501895
  }

Debuging the request I made using my app id and our access token generated using the script I get this:

 "perms": [
      "user_birthday",
      "user_religion_politics",
      "user_relationships",
      "user_relationship_details",
      "user_hometown",
      "user_location",
      "user_likes",
      "user_education_history",
      "user_work_history",
      "user_website",
      "user_events",
      "user_photos",
      "user_videos",
      "user_friends",
      "user_about_me",
      "user_status",
      "user_games_activity",
      "user_tagged_places",
      "user_posts",
      "rsvp_event",
      "email",
      "read_insights",
      "publish_actions",
      "read_audience_network_insights",
      "read_custom_friendlists",
      "user_actions.books",
      "user_actions.music",
      "user_actions.video",
      "user_actions.news",
      "user_actions.fitness",
      "user_managed_groups",
      "manage_pages",
      "pages_manage_cta",
      "pages_manage_instant_articles",
      "pages_show_list",
      "publish_pages",
      "read_page_mailboxes",
      "ads_management",
      "ads_read",
      "business_management",
      "pages_messaging",
      "pages_messaging_phone_number",
      "pages_messaging_subscriptions",
      "pages_messaging_payments",
      "instagram_basic",
      "instagram_manage_comments",
      "instagram_manage_insights",
      "public_profile"
    ],

The only difference being "basic_info" that I don't know how to get it, because I have already every permission type possible at Facebook Dev Dashboard.

arianpasquali commented 6 years ago

If you copy access token generated by Graph Explorer and paste it at fb_scrape_public.py line 117 it works fine. The API responses contain authors info as it is supposed to. Remember that version needs to be <= 2.10

example:

#fb_token = 'access_token=' + json.loads(fb_urlobj.read().decode(encoding="latin1"))['access_token']
fb_token = 'access_token=' + "EAACEdEose0cBAO8vY4yQoiucmoENM6ijm3PgeTllZAJ3tDX2mFQt3ZC4iZCQ1bPwSHWzccFZBDT1ZAFmXVZBaiZC0buGEcW5DRj79L4CMggHQ2QpoFOlkFfV8e8pZBBY0MgLOfgv8aglMGbYN5Hasjvsf0ynYiwJrhdi58lPuRWxYKpSxJbesHeSPwTLbgAoZByvLYREBnAZCLLJtqVvyrx6Vb"

That's why I wonder if there is something related to acess tokens (app tokens versus users tokens maybe).

dfreelon commented 6 years ago

I just verified that it works on my end on at least one post. Could you please send the specific post you are trying to get comments from, and also give the Python version you are using?

dfreelon commented 6 years ago

I added a new feature that lets users paste in their own access tokens at runtime. That should fix the problem. I'm going to close this now but feel free to reopen if it still doesn't work.

arianpasquali commented 6 years ago

I see your last update. I did a similar thing. Here is the test scenario:

Let's use an example barackobama fb page, post_id 6815841748_10155375836346749

request url:

curl -i -X GET \
   "https://graph.facebook.com/v2.10/6815841748_10155375836346749/comments?fields=from,message,created_time,like_count&limit=100&access_token=<access token sanitized>"

example of an access token (not a real one): 1524347784314473|-BR9IpptnlLl3B13SfIV0aT9KfK

the result, as you can see, doesn't return authors info (line['from']):

{'message': "Dear Mr. President. This will be the year my husband and I will have to forgo medical insurance.  According to what I am seeing our premiums will be at a minimum, $1500.00 per month.  We simply can not afford that and still pay for our home. Unfortunately we don't qualify for any subsidies, so I guess we go without from now on.  Also unfortunately, I know our government will be charging us a penalty for not being able to afford the excessively high insurance coasts.", 'created_time': '2017-11-01T21:37:45+0000', 'like_count': 1, 'id': '10155375836346749_188471105051392', 'LOVE': 0, 'WOW': 0, 'HAHA': 0, 'SAD': 0, 'ANGRY': 0}
{'message': 'Get America *truly* covered with #UniversalHealthCoverage not crazy-expensive, high-deductible plans, co-pays and other obstacles to health care. Support, lobby for HR676, S1804 or anything better.', 'created_time': '2017-11-01T19:31:33+0000', 'like_count': 1, 'id': '10155375836346749_720144548182909', 'LOVE': 0, 'WOW': 0, 'HAHA': 0, 'SAD': 0, 'ANGRY': 0}
{'message': "Thank you so much, President Obama, for all you did for healthcare while you were in office. You helped my immediate family members and myself through a very rough few years. I honestly don't know what we would've done without the changes you made happen. Again....thank you. Kelly", 'created_time': '2017-11-01T20:16:22+0000', 'like_count': 0, 'id': '10155375836346749_1595621900477225', 'LOVE': 1, 'WOW': 0, 'HAHA': 0, 'SAD': 0, 'ANGRY': 0}

now the same request at the Facebook Graph Explorer:

curl -i -X GET \
   "https://graph.facebook.com/v2.10/6815841748_10155375836346749/comments?fields=from%2Cmessage&access_token=<access token sanitized>"

screenshot 2017-11-21 20 47 46

dfreelon commented 6 years ago

So I guess my advice would be to use the Graph Explorer token, which is now explicitly supported. Not sure why it works for me but not you...

arianpasquali commented 6 years ago

Yes. I'm using the graph explorer token for now. It works, but using it is a workaround.

arianpasquali commented 6 years ago

Yes. Me neither. I will keep investigating.

dfreelon commented 6 years ago

This page says it lets you create 60-day user access tokens: https://www.slickremix.com/facebook-60-day-user-access-token-generator/ I haven't tested it myself so caveat emptor!!

arianpasquali commented 6 years ago

Thanks Deen. I believe that is the way. I'm using an app token while the explorer is using an user token.

You can see the difference just looking at the string format.

app token : appid|some hash user token : huge hash

thanks for the link. I will test.

dfreelon commented 6 years ago

Here's what I get from that post ID with my app ID + secret, FWIW... { "data": [ { "from": { "name": "Michelle Timothy", "id": "10214975805897116" }, "message": "I get insurance through my employer but, I had to listen anyway because I so miss a coherent complete sentence speaking president that doesn\u2019t sound like a tired, cranky toddler.", "id": "10155375836346749_10155375912656749" }, { "from": { "name": "Barbara Ocon Hulsizer", "id": "10214511581293991" }, "message": "I love this. The current administration cut the advertising budget by 90\u0025. So the guy with 55,000,000 followers puts this on his Facebook page!!", "id": "10155375836346749_10155375844451749" }, { "from": { "name": "David Bourton", "id": "892133337628195" }, "message": "A President able to speak above the level of a toddler. Doesn't seem like much to ask for but you don't realise how valuable it is until it's replaced by an infant orange.", "id": "10155375836346749_10155375847121749" }, [etc]

arianpasquali commented 6 years ago

using access token debug tool from Facebook (https://developers.facebook.com/tools/debug/accesstoken).

There is a big difference in the scopes. Wich is odd. The first image is debug the user token for the Graph Explorer App, the second is debuging the user token from my app.

I ve tried to explore my app dashboard and I didn't find where to change that. My app is in dev mode and I create it just for this test.

screenshot 2017-11-21 21 17 11

screenshot 2017-11-21 21 17 29

arianpasquali commented 6 years ago

I see already how to set extended permissions for the app.

https://developers.facebook.com/apps/#your-app-id#/review-status/

You need to go there and select the permissions /scopes you want.

Quick question. Did you have to do any of that? If you debug your user token do you get more than the default scopes?

dfreelon commented 6 years ago

Here are the scopes for my app:

dfreelon commented 6 years ago

user_friends, user_posts, publish_actions, public_profile