Don't commit data, private info, credentials, etc.
write your script in a new folder "scripts/facebook-group-info-scraper
use any language you want. Preferably Python.
use conservative rate-limiting and a dynamic DOM renderer like selenium or Puppeteer.
the script should get FB_USERNAME, FB_PASSWORD + MongoDB credentials via .env file
the script should get group ids from #6 (see comments)
the script saves data to local MongoDB instance (see schema below)
Scraping Facebook Groups General Information
We need data on all the Facebook groups in the community.
The data available on public FB groups (not including content like posts, pics, events, etc) I have found by manually going through 2 FB group pages includes:
Note: I compiled this by manually going through 2 FB group pages, please go through a few more pages yourself to see if some groups have more, less or differing public data available and we will update our schema
id
name
isPublic
description
foundedOn
memberCount
adminCount
moderatorCount
memberCountIncreaseWeekly
postCountIncreaseMonthly
postCountIncreaseDaily
moderatorList, adminList, memberList, pageList (pages can be in a group! these are lists of ids)
We will not get any other information about individuals other than their facebook id. This data is needed because we want to see how connected groups are (how many individuals they have in common) and we want to reach out to those individuals that are in a shit ton of groups! Very useful for coalition-building
It works well! But! We NEED to collect the timestamp on all the posts. It doesnt work with 100% consistency, you will have to troubleshoot. We will use this data to make a news aggregator and to keep an eye out for more data for coalition-building purposes.
Note on FB Scraping, Data Privacy, Future Roadmap
See #5
Prerequisite: Seed Data
See #6
Requirements
Scraping Facebook Groups General Information
We need data on all the Facebook groups in the community.
The data available on public FB groups (not including content like posts, pics, events, etc) I have found by manually going through 2 FB group pages includes:
Note: I compiled this by manually going through 2 FB group pages, please go through a few more pages yourself to see if some groups have more, less or differing public data available and we will update our schema
We will not get any other information about individuals other than their facebook id. This data is needed because we want to see how connected groups are (how many individuals they have in common) and we want to reach out to those individuals that are in a shit ton of groups! Very useful for coalition-building
Scraping Posts
I started a script in scripts/facebook-group-posts-scraper using this library: https://github.com/kevinzg/facebook-scraper
It works well! But! We NEED to collect the timestamp on all the posts. It doesnt work with 100% consistency, you will have to troubleshoot. We will use this data to make a news aggregator and to keep an eye out for more data for coalition-building purposes.
How your script will store and normalize the data
Database will be MongoDB
Schema
Misc
Random lib I found: https://github.com/ParvJain/Facebook-Group-Scraper (please look through)