Clueless-Community / scrape-up

A web-scraping-based python package that enables you to scrape data from various platforms like GitHub, Twitter, Instagram, or any useful website.
https://pypi.org/project/scrape-up/
MIT License
244 stars 247 forks source link

LinkedIn scraping #186

Closed DyuthiVivek closed 1 year ago

DyuthiVivek commented 1 year ago

Hi @nikhil25803, I am a GSSOC'23 contributor

A feature that gets the LinkedIn information of a user could be added. Using web scraping, we can fetch information about a user's profile on LinkedIn, such as: bio, education, experiences, activity, connections, followers, etc.

Kindly assign me this issue.

ghost commented 1 year ago

hi @nikhil25803 i also used web-scraping in my project named as course differentiator using beautiful soup where I scrape the data of course providing platforms like coding ninjas, Coursera, etc. I can easily scrape the LinkedIn data easily please assign this task to me I m a GSSOC'23 contributor

nikhil25803 commented 1 year ago

Sure, go ahead @mvpfrever. Make a different module and class first as per the project structure. Do not add multiple methods at first. Start with your name and bio. So @mvpfrever - Create a Separate class and create a method to get the name on the providing account ID. Eg. - For the URL https://www.linkedin.com/in/nikhil25803/, my Id is nikhil25803. Create a method .get_name() to scrape the name.

And @mvpfrever - You work on the scraping of the Bio of the user.

You two can connect for this issue, so that do not create a separate module at once.

ghost commented 1 year ago

@nikhil25803 sir I have to scrape both name and bio? And 1 more thing we have to scrape a data of any organization's employee or whole the linkdin user because there are so many user in LinkedIn

nikhil25803 commented 1 year ago

Scrape the name only at first @mvpfrever. And we have to scrape the data of a particular user based on the user ID provided. I have shown you the example as well.

ghost commented 1 year ago

@nikhil25803 sir the scraping of name by link provided by user is done

DyuthiVivek commented 1 year ago

@nikhil25803 Since I have created this issue and have asked for it to be assigned to me first, can I have the first shot at it?

nikhil25803 commented 1 year ago

Sure @DyuthiVivek !! Get in touch with @mvpfrever and co-ordinate on this.

ghost commented 1 year ago

Sir name scrapped what next?

DyuthiVivek commented 1 year ago

@nikhil25803 I am working on scraping the bio, I will be done by the weekend.

BabarRasheed commented 1 year ago

Hi, I'm Babar Rasheed (Contributor GSSOC'23) Many websites don't offer API so to tackle this we can use Web Scraping to access data in an easy and structured manner. Python libraries like bs4, BeautifulSoup, Scrapy, Selenium, etc. are generally used for web scraping. Here I'm willing to apply these libraries and use an effective way of Multiprocessing to speed up Web Scraping. Multiprocessing is very helpful when multiple URLs are scraped to get the data. It will perform scraping on multiple URLs thus saving our time.

ghost commented 1 year ago

@DyuthiVivek is bio scraping done?

DyuthiVivek commented 1 year ago

@mvpfrever I am working on it and will raise my pr as soon as I am done. If there is no dependency for you, you can raise your pr.

nikhil25803 commented 1 year ago

Guys @mvpfrever and @DyuthiVivek, any updates?

ghost commented 1 year ago

@nikhil25803 sir I already completed my part waiting for your call what next I have to do

nikhil25803 commented 1 year ago

Make a PR @mvpfrever

DyuthiVivek commented 1 year ago

@nikhil25803 LinkedIn seems to be resisting scraping by throwing a captcha or by forcing to add a verification code sent via e-mail. This breaks the scraping logic. It seems to work for only some profiles. Any tips on how to avoid verification?

nikhil25803 commented 1 year ago

@DyuthiVivek | Maybe that is the issue, I do not have any solution in mind for now. If possible, give it one more try, else close the issue for now.

DyuthiVivek commented 1 year ago

@nikhil25803 closing the issue.