Closed github-actions[bot] closed 4 months ago
My turn!
Good luck! If you face any issues, let me know.
@EssamWisam, a new issue (https://github.com/EssamWisam/cmp-docs/issues/68) has been created since it has been 2 more weeks. If you are blocked for some reason, let me know. If, however, you are just busy and have the run scheduled for a later date, then no problem and take your time.
I just scheduled to do this at a particular time then could not start because time was gone elsewhere then remained busy and forgot. I will schedule to do this today inshAllah and write back here if anything blocks me.
@Iten-No-404
May I ask, how much does the script typically require to run? I think I waited for about two hours or so yesterday and it was still going (then I slept and closed laptop lid but I think then it was not able to continue).
I am unable to scroll up in the output traceback to show some messages it was throwing me but it was about many missing profile pictures for 2023 students (including students I thought do have a profile picture).
The main issue in any case remains time. Could you let me know how long it typically takes.
@EssamWisam, it typically takes around 3 hours to fully run. However, I believe the problem you're facing could be due to one or more of the following reasons:
=
sign don't use quotes like so LINKEDIN_SCRAPER_PASSWORD="To try and debug or fix this issue, you can try the following steps:
python "scripts/linkedin-scraper/linkedin-scraper.py" "public/department/Extras/Classes/C2023.yaml"
If running the script on a single YAML file was successful, then you can just run it again individually on C2020, C2021, & C2024_Credit. (since they are the only ones with LinkedIn links beside C2023)
If you are still facing any issues, let me know. Also, I can run the script this weekend if the issues still persist.
Being abroad, I doubt the reason is a slow connection and I used my main account (was that wrong?) and I think output would have been different if it didn't log in.
I didn't know that it takes three hours. I maybe expected much less and surely slept by or before two hours or something. I will try a run again on 2023
only. With this duration of runtime, I think we should run this every month.
I used my main account (was that wrong?)
Not wrong per se but definitely not recommended for 2 reasons: it will alert everyone whose account you parsed that you viewed their profile, and more importantly your LinkedIn account will run the risk of being flagged or banned on the long run which can't be a good thing especially if you have invested a lot of effort into it.
I didn't know that it takes three hours.
With this duration of runtime, I think we should run this every month.
It's not a big deal, it takes almost an hour per class and it can be easily run in the background while you are working on other stuff. Either way, I don't mind changing the frequency of the runs to once a month or leaving it as is.
I think output would have been different if it didn't log in.
Fair enough. I still recommend observing the first 5 minutes or so to make sure that everything is running smoothly.
I will try a run again on
2023
only.
Alright. Good luck.
Not wrong per se...
My fault for not noticing the note in the original text of the issue. I am now scared and hope inshAllah nothing will happen to my account. Is the dummy account you have been using still alive after using it multiple times?
It's not a big deal, it takes almost an hour per class...
I know it can be run in parallel but maybe some people like me get bothered when many tabs or programs are open unused (I frequently try to avoid that) and it's really longer than I expected. I have no idea why it is that slow; scrapping usually tends to be somewhat faster.
Will set my expectations better next time when I try running it inshAllah.
I will try a run again on 2023 only.
After some time to recover from the minor shock...
I am now scared and hope inshAllah nothing will happen to my account. Is the dummy account you have been using still alive after using it multiple times?
Don't worry, the ban affects the account almost instantaneously. If you are able to manually login right now, then there shouldn't be any problem. As for the dummy account, it got some one-day bans/blocks but it is still active and useable. It is easy to wait out the bans. So, your account should be fine insha'allah.
I have no idea why it is that slow; scrapping usually tends to be somewhat faster.
True, the script can be optimized a little to be faster but for now I think it's good enough.
After some time to recover from the minor shock...
No problem, take your time.
@EssamWisam, I have run the script today, and so I will close all the 3 reminder issues. You don't need to run the script any time soon.
@EssamWisam, I have run the script today, and so I will close all the 3 reminder issues. You don't need to run the script any time soon.
Thank you so much. inshAllah next time I will be aware of the consequences and properly ready when I do it.
Thank you so much. inshAllah next time I will be aware of the consequences and properly ready when I do it.
Don't mention it. I really didn't do much. I left it running in the background while working. It didn't affect my schedule in any way.
It's been two weeks.
It's time to run the LinkedIn script to get the latest titles and current positions of CMP students and graduates.
To run the script follow the steps below and if it's not your first time running the script, you can just start from step 4:
Steps for running the LinkedIn script:
requirements.txt
present in thescripts/linkedin-scraper
directory.chromedriver.exe
file to thescripts/linkedin-scraper
directory.and replace
<email>
and<password>
with the actual LinkedIn credentials. Note, you should probably avoid using your main LinkedIn account credentials to avoid running the risk of it being banned by LinkedIn after multiple scraping.and if you want to run the script for a certain class only, use the command below and replace
20XX
with the graduation year of said class:Last Notes:
<email>
and<password>
written correctly in the environment variables.