dimitryzub / scrape-google-scholar-py

Extract data from all Google Scholar pages from a single Python module. NOTE: I'm no longer maintaining this repo. Chrome driver/selectors might need and update.
MIT License
87 stars 16 forks source link

no results returned #10

Closed yudeng2022 closed 1 year ago

yudeng2022 commented 1 year ago

Hi Thank you for providing this package! I ran the following script but got nothing returned. Any idea what caused it?

image

dimitryzub commented 1 year ago

@yudeng2022 thank you for your trust and for opening this issue 👍

Update: The answer is simple 😁Google itself doesn't return any results:

image

Are you sure you meant to do a Profile search instead of Organic results? Meaning instead of CustomGoogleScholarProfiles use CustomGoogleScholarOrganic?

image

Let me know if we can close this issue 🙂


Initial answer

Could you please tell:

Could you try running %pip install scrape-google-scholar-py before imports? Although not sure if it makes a difference.

Running locally on Python 3.11 via VSCode Jyputer plugin from fresh virtual environment:

image

image

Worked without %pip install scrape-google-scholar-py on a fresh environment:

image

CLI process ```bash /c/Workspace/Programming/SerpApi/projects/scrape-google-scholar-py › python -m venv env /c/Workspace/Programming/SerpApi/projects/scrape-google-scholar-py › source env/Scripts/activate (env) /c/Workspace/Programming/SerpApi/projects/scrape-google-scholar-py › pip install scrape-google-scholar-py Collecting scrape-google-scholar-py Using cached scrape_google_scholar_py-0.3.3-py3-none-any.whl Collecting google-search-results>=2.4.2 Using cached google_search_results-2.4.2.tar.gz (18 kB) Preparing metadata (setup.py) ... done Collecting selectolax>=0.3.12 Using cached selectolax-0.3.13-cp311-cp311-win_amd64.whl (2.0 MB) Collecting parsel>=1.7.0 Using cached parsel-1.8.1-py2.py3-none-any.whl (17 kB) Collecting selenium-stealth>=1.0.6 Using cached selenium_stealth-1.0.6-py3-none-any.whl (32 kB) Collecting pandas>=1.5.3 Using cached pandas-2.0.1-cp311-cp311-win_amd64.whl (10.6 MB) Collecting webdriver-manager>=3.8.5 Using cached webdriver_manager-3.8.6-py2.py3-none-any.whl (27 kB) Collecting requests Using cached requests-2.29.0-py3-none-any.whl (62 kB) Collecting python-dateutil>=2.8.2 Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB) Collecting pytz>=2020.1 Using cached pytz-2023.3-py2.py3-none-any.whl (502 kB) Collecting tzdata>=2022.1 Using cached tzdata-2023.3-py2.py3-none-any.whl (341 kB) Collecting numpy>=1.21.0 Using cached numpy-1.24.3-cp311-cp311-win_amd64.whl (14.8 MB) Collecting cssselect>=0.9 Using cached cssselect-1.2.0-py2.py3-none-any.whl (18 kB) Collecting jmespath Using cached jmespath-1.0.1-py3-none-any.whl (20 kB) Collecting lxml Using cached lxml-4.9.2-cp311-cp311-win_amd64.whl (3.8 MB) Collecting packaging Using cached packaging-23.1-py3-none-any.whl (48 kB) Collecting w3lib>=1.19.0 Using cached w3lib-2.1.1-py3-none-any.whl (21 kB) Collecting Cython>=0.29.23 Using cached Cython-0.29.34-py2.py3-none-any.whl (988 kB) Collecting selenium Using cached selenium-4.9.0-py3-none-any.whl (6.5 MB) Collecting python-dotenv Using cached python_dotenv-1.0.0-py3-none-any.whl (19 kB) Collecting tqdm Using cached tqdm-4.65.0-py3-none-any.whl (77 kB) Collecting six>=1.5 Using cached six-1.16.0-py2.py3-none-any.whl (11 kB) Collecting charset-normalizer<4,>=2 Using cached charset_normalizer-3.1.0-cp311-cp311-win_amd64.whl (96 kB) Collecting idna<4,>=2.5 Using cached idna-3.4-py3-none-any.whl (61 kB) Collecting urllib3<1.27,>=1.21.1 Using cached urllib3-1.26.15-py2.py3-none-any.whl (140 kB) Collecting certifi>=2017.4.17 Using cached certifi-2022.12.7-py3-none-any.whl (155 kB) Collecting trio~=0.17 Using cached trio-0.22.0-py3-none-any.whl (384 kB) Collecting trio-websocket~=0.9 Using cached trio_websocket-0.10.2-py3-none-any.whl (17 kB) Collecting colorama Using cached colorama-0.4.6-py2.py3-none-any.whl (25 kB) Collecting attrs>=19.2.0 Using cached attrs-23.1.0-py3-none-any.whl (61 kB) Collecting sortedcontainers Using cached sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB) Collecting async-generator>=1.9 Using cached async_generator-1.10-py3-none-any.whl (18 kB) Collecting outcome Using cached outcome-1.2.0-py2.py3-none-any.whl (9.7 kB) Collecting sniffio Using cached sniffio-1.3.0-py3-none-any.whl (10 kB) Collecting cffi>=1.14 Using cached cffi-1.15.1-cp311-cp311-win_amd64.whl (179 kB) Collecting exceptiongroup Using cached exceptiongroup-1.1.1-py3-none-any.whl (14 kB) Collecting wsproto>=0.14 Using cached wsproto-1.2.0-py3-none-any.whl (24 kB) Collecting PySocks!=1.5.7,<2.0,>=1.5.6 Using cached PySocks-1.7.1-py3-none-any.whl (16 kB) Collecting pycparser Using cached pycparser-2.21-py2.py3-none-any.whl (118 kB) Collecting h11<1,>=0.9.0 Using cached h11-0.14.0-py3-none-any.whl (58 kB) Installing collected packages: sortedcontainers, pytz, w3lib, urllib3, tzdata, sniffio, six, python-dotenv, PySocks, pycparser, packaging, numpy, lxml, jmespath, idna, h11, exceptiongroup, Cython, cssselect, colorama, charset-normalizer, certifi, attrs, async-generator, wsproto, tqdm, selectolax, requests, python-dateutil, parsel, outcome, cffi, webdriver-manager, trio, pandas, google-search-results, trio-websocket, selenium, selenium-stealth, scrape-google-scholar-py DEPRECATION: google-search-results is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/8559 Running setup.py install for google-search-results ... done Successfully installed Cython-0.29.34 PySocks-1.7.1 async-generator-1.10 attrs-23.1.0 certifi-2022.12.7 cffi-1.15.1 charset-normalizer-3.1.0 colorama-0.4.6 cssselect-1.2.0 exceptiongroup-1.1.1 google-search-results-2.4.2 h11-0.14.0 idna-3.4 jmespath-1.0.1 lxml-4.9.2 numpy-1.24.3 outcome-1.2.0 packaging-23.1 pandas-2.0.1 parsel-1.8.1 pycparser-2.21 python-dateutil-2.8.2 python-dotenv-1.0.0 pytz-2023.3 requests-2.29.0 scrape-google-scholar-py-0.3.3 selectolax-0.3.13 selenium-4.9.0 selenium-stealth-1.0.6 six-1.16.0 sniffio-1.3.0 sortedcontainers-2.4.0 tqdm-4.65.0 trio-0.22.0 trio-websocket-0.10.2 tzdata-2023.3 urllib3-1.26.15 w3lib-2.1.1 webdriver-manager-3.8.6 wsproto-1.2.0 [notice] A new release of pip available: 22.3 -> 23.1.2 [notice] To update, run: python.exe -m pip install --upgrade pip ```
dimitryzub commented 1 year ago

Closing as Google itself doesn't return any results. Feel free to add additional info.

yudeng2022 commented 1 year ago

Hi dimitryzub, thank you for your reply! I meant to get all the results containing the keyword 'dynamic risk prediction'. also it seems the query only return 10 results (result from the first page). How can I get results from all returned pages?

dimitryzub commented 1 year ago

@yudeng2022

I meant to get all the results containing the keyword 'dynamic risk prediction'

What do you mean by "all results containing the keyword 'dynamic risk prediction'"? All results from searching Google Scholar profiles? Could you please clarify or give an example Google Scholar link?

Thank you 👍

yudeng2022 commented 1 year ago

Hi Thank you for your reply! I tried to use the following code to get results


serpapi_parser_get_organic_results = SerpApiGoogleScholarOrganic().scrape_google_scholar_organic_results(
    #query= '("longitudinal data" OR "repeated measurements" OR "longitudinal measurements" OR "time series data") AND ("cardiovascular disease prediction" OR "cardiovascular disease risk prediction" OR "prediction of cardiovascular disease")' ,
    query='blizzard',
    api_key='##myapikey##', # https://serpapi.com/manage-api-key
    lang='en',
    pagination=True,
)

However, I got the error message: image

Do you know what caused it by any chance?