J4NN0 / linkedin-web-scraper

Python Web Scraper for LinkedIn to collect and store company data (e.g. name, description, industry, etc.) into .xls file
https://youtu.be/2pSjPOuMDhk
GNU General Public License v3.0
37 stars 9 forks source link
openpyxl python-excel python-web-scraper scraper scraping-websites scrapy scrapy-crawler scrapy-demo scrapy-spider scrapy-tutorial selenium selenium-python selenium-webdriver webscraper webscraper-api webscraper-website webscraping webscraping-search

LinkedIn Web Scraper

This is a LinkedIn Python Web Scraper for companies. The script fully simulate a human activity (using Selenium library) in order to get data from LinkedIn web pages. The purpose is store data from companies of a certain zone, such as:

After collected the above information, these will be stored into an .xls file.

Disclaimer

Any actions and or activities related to the material contained within this repo is solely your responsibility. The misuse of the information in this repo can result in criminal charges brought against the company in question. The author will not be held responsible in the event any criminal charges be brought against any individuals misusing the information in this repo to break the law.

As written in Linkedin User Agreement: you agree you will not use [...] any bots or other automated methods to access the Services, add or download contacts, send or redirect messages.

Terms And Conditions

Demo

Watch the video

Table of Contents

Usage

  1. Clone project

    git clone https://github.com/J4NN0/linkedin-web-scraper.git cd linkedin-web-scraper

  2. Install requirements

    pip install -r requirements.txt

  3. Download the web driver you prefer and put it inside project folder:

  4. Set missing configs in config.ini:

    • LinkedIn credentials i.e., EMAIL and PASSWORD.
    • The WEBDRIVER (downloaded on step 3).
    • And CITY from which companies have to be fetched.

    Note that also others kind of parameters can be set.

  5. Run script

    python3 main.py

    Data will be store into companies.xlsx file.

Troubleshooting

Resources