Python script to find researchers and scrap their data.
- Several output file formats: JSON, CSV or XLS. Currently working on
Firebase Realtime Database.
TODO - Find:
Scraps:
If you prefer to use only the scraper module in your project, you only need
to install urllib
and beautifulsoup4
.
get_personal_data ( url, [force_refresh] ): should returns a dictionary containing general personal information like study fields, organization, department, etc.
get_stats ( url, [force_refresh] ): should returns a dictionary with specific bibliometric data like publications, citations, indexes, etc.
Two implementations of this abstract class are provided:
ScholarScraper
and ResearchGateScraper
.
is_valid_scholar_profile_url
function from
./utils/validation_utils.py
. Check before is not strictly necessary.Python dict of 2 keys:
personal_info
whose value is the first string under researcher name in Scholar's profile,
tipically it contains organization name, department, job, etc.study_fields
whose value is a strings array with all study fields listed in the profile{ personal_info: "Profesor de Física (ULPGC)" study_fields: ["plasma physics", "laboratory astrophysics"] }
Same as get_personal_data
Python dict with multiple keys:
citations
a dict of 2 keys and integer values:
total
: total of citationslast5Years
: citations from the last 5 years.hIndex
a dict of 2 keys and integer values:
total
: current h-indexlast5Years
: h-index calculated with citations from the last 5 years.i10
a dict of 2 keys and integer values:
total
: current i10 indexlast5Years
: i10 calculated with citations from the last 5 years.citationsPerYear
a dict of year:citations.
{ citations: {total: 13432, last5Years: 6224"}, hIndex: {total: 66, last5Years: 47"}, i10: {total: 169, last5Years: 147"}, citationsPerYear: {1999:7, 2000:0, 2001:36, ..., 2018:1160, 2019:253} }
TODO...
Check examples folder.