aahouzi / Instagram-Scraper-2021

Scrape Instagram content and stories, using a new technique based on the har file (No Token + No public API).
MIT License
111 stars 12 forks source link
browsermob-proxy data facebook facebook-graph-api graphql-api instagram instagram-api instagram-bot instagram-crawler instagram-feed instagram-scraper instagram-stories meta scraper selenium webscraping

Scrape Instagram content & stories | 2021 version.

:monocle_face: Description

:rocket: Repository Structure

The repository contains the following files & directories:

:scroll: Scraping process

:bulb: Scraping comments

An improvement for this project would be to use the same technique of the har file to scrape all comments given the link of a certain publication. It can be easily implemented using the same strategy: "We start by having access to the publication (Format: https://www.instagram.com/p/***), scrolling up comments and clicking every time on the plus button to load more comments". The more we click on the plus button, the more we collect graphql responses, and so comments (12 comments per graphql response). However, scraping comments will take much more time than scraping content, since we can have thousands of comments in a publication, and getting 12 comments per graphql response is time consuming.

:mailbox_closed: License & Contact

This code is free to use, share and modify for any non-commercial purposes, any commercial use is strictly prohibited without the authors' consent. This project is for educational purposes, and has no intent to mess with Instagram policies concerning data privacy. For any information, feedback or questions, please contact me