pylinkedin
is a python package to scrape all details from public LinkedIn profiles.
It can also be used as a parser to transform html LinkedIn profiles into structured json.
Some precautions you should take if you want scrape LinkedIn with python :
999
to the http requests. Especially LinkedIn banned most ips from cloud providers (Aws, Digital Ocean, ...).Run pip install git+git://github.com/ericfourrier/scrape-linkedin.git
git clone https://github.com/ericfourrier/scrape-linkedin.git
Run python setup.py install
The tests are runs with a html file from a LinkedIn profile. The main reason is because Travis use aws machine and its ips are banned by Linkedin.
Especially the fact that the test suite is passed is not a good indicator than the package will work (Your ip can be banned or LinkedIn html source code changed).
You can still run the test suite at the root of the package with pytest: py.test test.py
.
pylinkedin comes with a simple command line argument module pylinkedin
.
Options:
Examples:
pylinkedin -u https://www.linkedin.com/in/jeffweiner08
pylinkedin -u https://www.linkedin.com/in/jeffweiner08 -a skills
pylinkedin -f /path/file.html
It relies on two class:
CustomRequest
which is just a way to customise your http request specifying a list of user-agents or proxies.
from pylinkedin.utils import CustomRequest
c = CustomRequest() # default with rotating proxies
c = CustomRequest(rotate_ua=False) # without rotating user-agent
c = CustomRequest(list_proxies=[{'https':'http://186.233.94.106:8080',
'http':'http://186.233.94.106:8080'}]))
LinkedinItem
is the main class, you can instantiate it with the URL of public profile using the url
parameter, or with the HTML contents of the profile page, using html_string
. See test.py
for an example of using a save HTML file as input for the scrapper.
from pylinkedin.scraper import LinkedinItem
l = LinkedinItem(url='https://www.linkedin.com/in/kennethreitz')
l = LinkedinItem(html_string=profile_string)
You can customize your requests using CustomRequest
class for LinkedinItem
c = CustomRequest(rotating_ua = True)
url_to_scrape = "https://www.linkedin.com/in/jeffweiner08"
l = LinkedinItem(url=url_to_scrape, crequest=c) # passing requests with rotating user-agent
To use the html_string
, make sure to browse to the public version of the profile page, as the private version will not work. The private version is the one showing the edit controls next to each section.
'LinkedinItem' has the folowing syntax the get the info :
l.name # to get the name
l.skills # to get the skills
l.publications # to get the publications
...
# the most important
l.to_dict() to get all infos
[volunteerings, last_name, number_recommendations, number_connections, current_location, honors, first_name, current_title, test_scores, current_industry, languages, similar_profiles, interests, profile_img_url, current_education, educations, experiences, groups, organizations, certifications, name, skills, websites, summary, project, courses, publications,recommendations]
Package is not actively maintained.
You can post bugs and issues here.