EssamWisam / cmp-docs

A comprehensive guide for prospective, current and past students in the computer engineering department of Cairo university.
https://cmp-docs.pages.dev
52 stars 8 forks source link

⏰ Reminder to Run LinkedIn Scraper #76

Open github-actions[bot] opened 2 days ago

github-actions[bot] commented 2 days ago

It's been two weeks.

It's time to run the LinkedIn script to get the latest titles and current positions of CMP students and graduates.

To run the script follow the steps below and if it's not your first time running the script, you can just start from step 4:

Steps for running the LinkedIn script:

  1. Make sure you have Python 3 downloaded on your device. You can check by running the command below in your bash terminal and it should display the Python version if it is already installed.
    python --version
  2. Install all the needed Python packages using the requirements.txt present in the scripts/linkedin-scraper directory.
    pip install -r "scripts/linkedin-scraper/requirements.txt"
  3. Download the Chrome Driver that is compatible with your OS and Chrome Version from this link. It should be a zip file of about 10 MBs or less. Extract it using WinRAR or a similar archive manager. Then copy the chromedriver.exe file to the scripts/linkedin-scraper directory.
  4. Set the enivronment variables with valid LinkedIn credentials in the bash terminal as following:
    export LINKEDIN_SCRAPER_EMAIL=<email>
    export LINKEDIN_SCRAPER_PASSWORD=<password>

    and replace <email> and <password> with the actual LinkedIn credentials. Note, you should probably avoid using your main LinkedIn account credentials to avoid running the risk of it being banned by LinkedIn after multiple scraping.

  5. Finally, you can run the script on all the class yaml files using the command below:
    python "scripts/linkedin-scraper/run.py" 

    and if you want to run the script for a certain class only, use the command below and replace 20XX with the graduation year of said class:

    python "scripts/linkedin-scraper/linkedin-scraper.py" "public/department/Extras/Classes/C20XX.yaml"

Last Notes:

EssamWisam commented 2 days ago

@Iten-No-404 This is to mention that it's sincerely appreciated that you are doing this. Thanks a lot!

Do you think we can rather make this update monthly or every six weeks? My impression is that the site is still not sufficiently popular to the extent that people are actively looking to see who isn't working at the moment to help (but inshAllah one day it could be). I'm feeling that the status quo could be that people just checks where graduates are working at and maybe reaches out. For this, I don't think precision in time matters a lot.

Iten-No-404 commented 2 days ago

@EssamWisam, You're very welcome. Unless the mock LinkedIn account gets blocked again, I should have this done on Friday insha'allah.

Yes, I get your point. I can easily change it to a monthly reminder. On the other hand, making it every six weeks might be a little challenging since cron expressions don't directly support weeks (the same issue as with the 2 weeks previously).

Once we believe that more people are using it, we can increase the frequency of the runs again. Thank you for your thoughtfulness and have a good day.