LinkedIn and Glassdoor Data Scraper

This Python script is designed to scrape data from both LinkedIn and Glassdoor. It provides a versatile toolset for collecting information on companies and their LinkedIn profiles, as well as scraping company data from Glassdoor. The script utilizes Selenium for web automation and Pandas for data handling.

Features

LinkedIn Scraper

Login Functionality: The script allows you to log in to your LinkedIn account automatically using your email and password, or manually if needed.
Company LinkedIn Links: It can scrape LinkedIn links for a list of companies. You can provide a CSV file containing company names to scrape their LinkedIn links. The script searches for companies on LinkedIn and retrieves their profile links.
Glassdoor Scraper
Company Data Scraping: This feature scrapes company data from Glassdoor. You can specify how many companies you want to scrape, and the script will fetch details such as company name, rating, reviews, salaries, jobs, location, industry, and description.

Disclaimer

Use with Caution: This script is intended for educational purposes and should be used responsibly and in compliance with the terms of service of the websites you scrape.
Account Verification: Using this script multiple times on the same LinkedIn account might trigger LinkedIn's security measures, such as CAPTCHA challenges and other methods to verify your identity. Use this script carefully and only when you are confident that it will work without raising suspicion.
Respect Usage Policies: Always respect the websites' robots.txt files and usage policies when scraping data.

Steps to run the application

Create a virtual environment, run the following command after cloning the repo
```
python -m venv venv
```
Activate the virtual environment
- Windows:
```
venv\Scripts\activate
```
- MacOS:
```
source venv/bin/activate
```
Install all required python packages [Make sure to have the virtual environment running]
```
pip install -r requirements.txt
```

Create a " .env " file in the folder and fill the following.

- NOTE: Dont worry, the .gitignore file has .env files ignored, so your credentials [the .env file] wont be pushed on the github and it will be on your local machine only.

LINKEDIN_MAIL = "YOUR_EMAIL_FOR_LINKEDIN"
LINKEDIN_PASS = "YOUR_PASSWORD_FOR_LINKEDIN"
AUTO_LOGIN = True

Run the main.py file using
```
python main.py
```
To stop the virtual environment running, run
```
deactivate
```

Usage

Environment Setup: Before using the script, ensure that you have set the necessary environment variables for LinkedIn login (LINKEDIN_MAIL and LINKEDIN_PASS) in a .env file.
Auto-login: You can enable auto-login by setting the AUTO_LOGIN environment variable to True. This allows the script to log in to your LinkedIn account automatically.
LinkedIn Scraper: Use option (s) to scrape LinkedIn company links. Provide a CSV file with a list of company names, and the script will search for them on LinkedIn and retrieve their profile links.
Glassdoor Scraper: Use option (c) to scrape company data from Glassdoor. Specify how many companies you want to scrape, and the script will fetch relevant details for each company.

Dependencies

Selenium: Used for web automation.
Pandas and NumPy: Used for data handling and manipulation.
Colorama: Provides colorful console output for better readability.
Dotenv: Loads environment variables from a .env file.

Output

The script generates two CSV files for Glassdoor data: "CompanyDataFalse.csv" and "CompanyDataTrue.csv." The former has the index column disabled, while the latter includes it.
For LinkedIn scraping, the script creates a CSV file named "CompanyLinkedIn.csv" containing the LinkedIn profile links for the scraped companies.

Prerequisites

You need to have Firefox, Chrome or Microsoft Edge installed and provide the name of the selected browser and the path to the respective browser's binary (commented out in the code). Alternatively, you can adapt the script to use other browsers by changing the WebDriver.
Ensure that you have installed all the required Python packages mentioned in the script.

Disclaimer

This script is intended for educational purposes and should be used responsibly and in compliance with the terms of service of the websites you scrape. Always respect the websites' robots.txt files and usage policies.

Authors

Varun Kamath

License

This project is licensed under the MIT License - see the LICENSE file for details.

Vaarun-Kamath / LinkedIn-Scraper

readme

LinkedIn and Glassdoor Data Scraper

Features

LinkedIn Scraper

Glassdoor Scraper

Disclaimer

Steps to run the application

Usage

Dependencies

Output

Prerequisites

Disclaimer

Authors

License