austinoboyle / scrape-linkedin-selenium

`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
MIT License
454 stars 162 forks source link

Scrape user by e-mail #29

Closed xxsacxx closed 5 years ago

xxsacxx commented 5 years ago

By using : url='https://www.linkedin.com/sales/gmail/profile/proxy/'+ gmail we can reach directly to the profile of user without knowing his 'username'

xxsacxx commented 5 years ago

also in profile.py for me it is working with :

followers_text = text_or_default(self.soup, '.pv-recent-activity-section__follower-count', '').strip() personal_info['followers'] = followers_text.split('\n')[0]

austinoboyle commented 5 years ago

I updated the profile scraper - a recent ui change caused the followers selector to fail. It should work now. I had no idea about that email feature, feel free to submit a pull request if you want to add that as a feature on the ProfileScraper

xxsacxx commented 5 years ago

Hi Austino, I found scrape-by-email has already been merged to the master. Also if you could mention the same in readme.md ,it would be helpful for the community.

xxsacxx commented 5 years ago

Also as 'Selenium' is quite slow takes around (10secs/profile),have you tried with any other alternatives, like pycurl etc

austinoboyle commented 5 years ago

Yes, this will not work AFAIK with anything other than a browser emulator. LinkedIn has very strong anti-scraping measures, and will block requests from any suspicious source. It is also almost completely javascript rendered, so you would need to manually make all AJAX calls manually, which would be quite cumbersome.

I will be sure to update the README to include the new feature documentation