dimitryzub / scrape-google-scholar-py

Extract data from all Google Scholar pages from a single Python module. NOTE: I'm no longer maintaining this repo. Chrome driver/selectors might need and update.
MIT License
87 stars 16 forks source link

cannot import #8

Closed abubelinha closed 12 months ago

abubelinha commented 1 year ago

I am using Windows 10, Python 3.9:

First I installed like this:

c:\python39\scripts\pip install scrape-google-scholar-py

Apparently it installed with no problems. Then I tried to import:

c:\>c:\python39\python
Python 3.9.9 (tags/v3.9.9:ccb0e6a, Nov 15 2021, 18:08:50) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from google_scholar_py import CustomGoogleScholarProfiles
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'CustomGoogleScholarProfiles' from 'google_scholar_py' (c:\python39\lib\site-packages\google_scholar_py\__init__.py)
>>> from google_scholar_py import SerpApiGoogleScholarOrganic
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'SerpApiGoogleScholarOrganic' from 'google_scholar_py' (c:\python39\lib\site-packages\google_scholar_py\__init__.py)
>>>

Same error trying either CustomGoogleScholarProfiles or SerpApiGoogleScholarOrganic

dimitryzub commented 1 year ago

@abubelinha thank you for opening this issue πŸ™‚

My assumption is this somehow being related to 3.9 version.

Apparently it installed with no problems.

How do you understand it? Just curious and trying to understand πŸ™‚ Could you show the full installation output?


Examples on my end. Windows 10, Python 3.11 using a virtual environment (not installed globally).

https://user-images.githubusercontent.com/78694043/235424002-f4d8a520-82ad-425b-809d-2f6bb37bb0e0.mp4

Your example (no ImportError):

β€Ί python
Python 3.11.0 (main, Oct 24 2022, 18:26:48) [MSC v.1933 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from google_scholar_py import CustomGoogleScholarProfiles
>>> from google_scholar_py import SerpApiGoogleScholarOrganic
>>>
>>> exit()

Another command with the output:

β€Ί python
Python 3.11.0 (main, Oct 24 2022, 18:26:48) [MSC v.1933 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from google_scholar_py import CustomGoogleScholarProfiles
>>> import json
>>>
>>> parser = CustomGoogleScholarProfiles()
>>> data = parser.scrape_google_scholar_profiles(
...     query='blizzard',
...     pagination=False,
...     save_to_csv=False,
...     save_to_json=False
... )
[WDM] - Downloading: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6.80M/6.80M [00:01<00:00, 6.64MB/s]
>>> print(json.dumps(data, indent=2))
[
  {
    "name": "Adam Lobel",
    "link": "https://scholar.google.com/citations?hl=en&user=_xwYD2sAAAAJ",
    "affiliations": "Blizzard Entertainment",
    "interests": [
      "Gaming",
      "Emotion regulation"
    ],
    "email": "Verified email at AdamLobel.com",
    "cited_by_count": 3791
  },
  {
    "name": "Catherine A Blizzard",
    "link": "https://scholar.google.com/citations?hl=en&user=vfPEiVUAAAAJ",
    "affiliations": "",
    "interests": null,
    "email": null,
    "cited_by_count": 1408
  },
  {
    "name": "Daniel Blizzard",
    "link": "https://scholar.google.com/citations?hl=en&user=dk4LWEgAAAAJ",
    "affiliations": "",
    "interests": null,
    "email": null,
    "cited_by_count": 1102
  },
  {
    "name": "Shuo Chen",
    "link": "https://scholar.google.com/citations?hl=en&user=OBf4YnkAAAAJ",
    "affiliations": "Senior Data Scientist, Blizzard Entertainment",
    "interests": [
      "Machine Learning",
      "Data Mining",
      "Artificial Intelligence"
    ],
    "email": "Verified email at cs.cornell.edu",
    "cited_by_count": 744
  },
  {
    "name": "Ian Livingston",
    "link": "https://scholar.google.com/citations?hl=en&user=xBHVqNIAAAAJ",
    "affiliations": "Blizzard Entertainment",
    "interests": [
      "Human-computer interaction",
      "User Experience",
      "Player Experience",
      "User Research",
      "Games"
    ],
    "email": "Verified email at usask.ca",
    "cited_by_count": 659
  },
  {
    "name": "Minli Xu",
    "link": "https://scholar.google.com/citations?hl=en&user=QST5iogAAAAJ",
    "affiliations": "Blizzard Entertainment",
    "interests": [
      "Game",
      "Machine Learning",
      "Data Science",
      "Bioinformatics"
    ],
    "email": "Verified email at blizzard.com",
    "cited_by_count": 557
  },
  {
    "name": "Je Seok Lee",
    "link": "https://scholar.google.com/citations?hl=en&user=vuvtlzQAAAAJ",
    "affiliations": "Blizzard Entertainment",
    "interests": [
      "HCI",
      "Player Experience",
      "Games",
      "Esports"
    ],
    "email": "Verified email at uci.edu",
    "cited_by_count": 434
  },
  {
    "name": "Alisha Ness",
    "link": "https://scholar.google.com/citations?hl=en&user=xQuwVfkAAAAJ",
    "affiliations": "Activision Blizzard",
    "interests": null,
    "email": null,
    "cited_by_count": 351
  },
  {
    "name": "Xingyu (Alfred) Liu",
    "link": "https://scholar.google.com/citations?hl=en&user=VW9ukOwAAAAJ",
    "affiliations": "Blizzard Entertainment",
    "interests": [
      "Machine Learning in Game Development"
    ],
    "email": "Verified email at andrew.cmu.edu",
    "cited_by_count": 278
  },
  {
    "name": "Amanda LL Cullen",
    "link": "https://scholar.google.com/citations?hl=en&user=oqna6OgAAAAJ",
    "affiliations": "Blizzard Entertainment",
    "interests": [
      "Games Studies",
      "Fan Studies",
      "Live Streaming"
    ],
    "email": null,
    "cited_by_count": 270
  }
]
>>>
abubelinha commented 1 year ago

Thanks for your prompt reply.

As for the installation output:

c:>c:\python39\scripts\pip install scrape-google-scholar-py
Collecting scrape-google-scholar-py
  Downloading scrape-google-scholar-py-0.2.27.tar.gz (35 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting selectolax==0.3.12
  Downloading selectolax-0.3.12-cp39-cp39-win_amd64.whl (1.9 MB)
     ---------------------------------------- 1.9/1.9 MB 5.2 MB/s eta 0:00:00
Collecting selenium-stealth==1.0.6
  Downloading selenium_stealth-1.0.6-py3-none-any.whl (32 kB)
Collecting google-search-results>=2.4
  Downloading google_search_results-2.4.2.tar.gz (18 kB)
  Preparing metadata (setup.py) ... done
Collecting pandas>=1.5.3
  Downloading pandas-2.0.1-cp39-cp39-win_amd64.whl (10.7 MB)
     ---------------------------------------- 10.7/10.7 MB 7.4 MB/s eta 0:00:00
Collecting parsel==1.7.0
  Downloading parsel-1.7.0-py2.py3-none-any.whl (14 kB)
Requirement already satisfied: packaging in c:\python39\lib\site-packages (from parsel==1.7.0->scrape-google-scholar-py) (21.3)
Requirement already satisfied: lxml in c:\python39\lib\site-packages (from parsel==1.7.0->scrape-google-scholar-py) (4.7.1)
Collecting w3lib>=1.19.0
  Downloading w3lib-2.1.1-py3-none-any.whl (21 kB)
Requirement already satisfied: cssselect>=0.9 in c:\python39\lib\site-packages (from parsel==1.7.0->scrape-google-scholar-py) (1.1.0)
Requirement already satisfied: Cython>=0.29.23 in c:\python39\lib\site-packages (from selectolax==0.3.12->scrape-google-scholar-py) (0.29.27)
Requirement already satisfied: selenium in c:\python39\lib\site-packages (from selenium-stealth==1.0.6->scrape-google-scholar-py) (3.141.0)
Requirement already satisfied: requests in c:\python39\lib\site-packages (from google-search-results>=2.4->scrape-google-scholar-py) (2.18.4)
Collecting tzdata>=2022.1
  Downloading tzdata-2023.3-py2.py3-none-any.whl (341 kB)
     ---------------------------------------- 341.8/341.8 kB 10.7 MB/s eta 0:00:00
Requirement already satisfied: pytz>=2020.1 in c:\python39\lib\site-packages (from pandas>=1.5.3->scrape-google-scholar-py) (2021.3)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\python39\lib\site-packages (from pandas>=1.5.3->scrape-google-scholar-py) (2.8.2)
Requirement already satisfied: numpy>=1.20.3 in c:\python39\lib\site-packages (from pandas>=1.5.3->scrape-google-scholar-py) (1.21.5+vanilla)
Requirement already satisfied: six>=1.5 in c:\python39\lib\site-packages (from python-dateutil>=2.8.2->pandas>=1.5.3->scrape-google-scholar-py) (1.16.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in c:\python39\lib\site-packages (from packaging->parsel==1.7.0->scrape-google-scholar-py) (3.0.6)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in c:\python39\lib\site-packages (from requests->google-search-results>=2.4->scrape-google-scholar-py) (3.0.4)
Requirement already satisfied: idna<2.7,>=2.5 in c:\python39\lib\site-packages (from requests->google-search-results>=2.4->scrape-google-scholar-py) (2.6)
Requirement already satisfied: certifi>=2017.4.17 in c:\python39\lib\site-packages (from requests->google-search-results>=2.4->scrape-google-scholar-py) (2021.10.8)
Requirement already satisfied: urllib3<1.23,>=1.21.1 in c:\python39\lib\site-packages (from requests->google-search-results>=2.4->scrape-google-scholar-py) (1.22)
Building wheels for collected packages: scrape-google-scholar-py, google-search-results
  Building wheel for scrape-google-scholar-py (pyproject.toml) ... done
  Created wheel for scrape-google-scholar-py: filename=scrape_google_scholar_py-0.2.27-py3-none-any.whl size=29185 sha256=40eaf39f199cc1d19e7882c9d88b52116f53bc3c64c6455f0ddf84fc44728df3
  Stored in directory: c:\users\abu\appdata\local\pip\cache\wheels\64\2a\60\e0fb0cf78bc2dad32cf92494b208d15bc4e7e584d6f8088c69
  Building wheel for google-search-results (setup.py) ... done
  Created wheel for google-search-results: filename=google_search_results-2.4.2-py3-none-any.whl size=32017 sha256=cdafe96383fa594ca3f90f60e2220e1bd4562407e5801d8d9e9df0a590525979
  Stored in directory: c:\users\abu\appdata\local\pip\cache\wheels\68\8e\73\744b7d9d7ac618849d93081a20e1c0deccd2aef90901c9f5a9
Successfully built scrape-google-scholar-py google-search-results
Installing collected packages: w3lib, tzdata, selectolax, selenium-stealth, parsel, pandas, google-search-results, scrape-google-scholar-py
  Attempting uninstall: tzdata
    Found existing installation: tzdata 2021.5
    Uninstalling tzdata-2021.5:
      Successfully uninstalled tzdata-2021.5
  Attempting uninstall: pandas
    Found existing installation: pandas 1.4.0
    Uninstalling pandas-1.4.0:
      Successfully uninstalled pandas-1.4.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pycirclize 0.1.1 requires pandas<2.0.0,>=1.3.5, but you have pandas 2.0.1 which is incompatible.
Successfully installed google-search-results-2.4.2 pandas-2.0.1 parsel-1.7.0 scrape-google-scholar-py-0.2.27 selectolax-0.3.12 selenium-stealth-1.0.6 tzdata-2023.3 w3lib-2.1.1

[notice] A new release of pip is available: 23.0.1 -> 23.1.2
[notice] To update, run: c:\python39\python.exe -m pip install --upgrade pip

I know there is some kind of problem with my pycirclize package but that's an old installation I just did not remove. I doubt that conflicts with your package installation: the final message is "Successfully installed google-search-results-2.4.2 pandas-2.0.1 parsel-1.7.0 scrape-google-scholar-py-0.2.27 selectolax-0.3.12 selenium-stealth-1.0.6 tzdata-2023.3 w3lib-2.1.1".

I cannot try a higher Python version right now in that machine. But I will come back to this, maybe in a couple of months. I am definitely interested in grabbing Google Scholar info from Python. Not that much (just running a query twice a year or so).

I take the opportunity to ask about the possibilities of the Serpapi free tier. I am only interested in (2-3 times a year) finding out new papers citing some specific words (related to my institution) within article text. On average, when paginating Google Scholar web interface I use to find about 30 papers a year. Would that be possible to do automatically with a Free Serpapi Plan? There it says "100 searches/month", but I wonder what "one search" means. Would something like this (scholarly example call) account for just one search?

Thanks!

dimitryzub commented 1 year ago

@abubelinha thank you for the additional details πŸ™‚

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. pycirclize 0.1.1 requires pandas<2.0.0,>=1.3.5, but you have pandas 2.0.1 which is incompatible.

Just to add value to this. You can downgrade the pandas package to a version 2.0.0 that is compatible with pycirclize:

pip install pandas==1.3.5 # or any other version up to 2.0.0

Also, it's best to test in an isolated virtual environment so you don't have conflicts with other packages that is already installed on your machine.

python -m venv myenv

I cannot try a higher Python version right now in that machine.

πŸ‘ I think isolated environment + Python 3.10 should solve this issue. I can set up a Github reminder (via Octo Reminder) for you to have a look one more time. Let me know.


On average, when paginating Google Scholar web interface I use to find about 30 papers a year. Would that be possible to do automatically with a Free Serpapi Plan?

There it says "100 searches/month", but I wonder what "one search" means.

Yes, it is possible.

One search = one request. For example, when you do a regular Google Scholar search from your browser by hand, you type a query, hit enter and Google returned results. That's one request with returned results. Same with SerpApi. Hope this makes sense πŸ™‚

Would something like https://github.com/scholarly-python-package/scholarly/issues/208#issuecomment-718190945 (scholarly example call) account for just one search?

Yes, it would be a one search πŸ‘

For example, when I change start param (paginating to the next page), it would take a 1 search (which is 1 request), and if you go through 10 pages, that would equal a 10 searches (which is 10 requests):

https://user-images.githubusercontent.com/78694043/235599519-225c290f-d334-4b4d-abe9-40bda761637d.mp4

And this is a search parameters that match scholarly.search_pubs(query = 'Ring Resonator', patents=True, citations=True, year_low=2010, year_high=2015):

# This is using SerpApi's Python wrapper instead of scrape-google-scholar-py

from serpapi import GoogleSearch

params = {
  "api_key": "...",
  "engine": "google_scholar",  # your serpapi key, https://serpapi.com/manage-api-key
  "q": "Ring Resonator",
  "hl": "en",            # language 
  "as_sdt": "7",         # include patents
  "as_ylo": "2010",      # from year
  "as_yhi": "2015",      # to year
  "start": "0"           # page number (0 - first page, 10 - second...)
}

search = GoogleSearch(params)
results = search.get_dict()

publications = []

for result in results["organic_results"]:
    publications.append(result) # appends all the data from the "organic_results" key

Let me know if it doesn't make sense πŸ™‚

abubelinha commented 1 year ago

Thanks for your detailed explanations !! I am not rushy for setting this up now. I'd prefer to wait until our IT staff upgrades machines probably next autumn. But I will definitely try all this.

A couple of questions I have though:

As I couldn't run your package I made some tests with scholarly. But I noticed GS output is truncated (see discussion here). So you actually need to launch several additional requests to get publication details for each of the 20 papers returned in "one request". I guess the same constraints apply using serpapi. Correct?

All this in order to get a simple "references list" which typical structure: full authors + year + title + journal/volume/issue/pages + url

So in order to produce that list for a single "original GS request" which returns 20 references, we would need at least one additional request for each reference? (or maybe more than one if we need to launch one request for full title, another for journal, another for authors ... I don't really know, just guessing)

So that a simple 20 references lists turns out to need: 1 request to get the original GS page with 20 references +20 requests to ensure you get full titles ? +20 requests to ensure you get full journal details ? +20 requests to ensure you get full authors ?

So at least 21 requests but probably 61 for getting full details? (or more, if I am missing something) And that's just for the 1st page of GS results.

Is this correct or would scrape-google-scholar-py / serpapi somehow include more detailed info in the original GS request, so we might not need to launch additional requests? (as per your sentence "scholarly only extract first 3 points")

Thanks a lot for your help

dimitryzub commented 1 year ago

@abubelinha of course πŸ™‚πŸ‘

In my browser GS returns 20 papers/page, although I think that is configurable (you can choose 10 or 20, but not more). So I understand that is what we are getting with one serpapi search too.

"but not more" - it's indeed the case. It's a Google Scholar restriction and neither I nor SerpApi can bypass it.

But what do we get in one of those requests?

If you're talking about SerpApi, then you get this response (big JSON) Taken from the playground: https://serpapi.com/playground?engine=google_scholar&q=Coffee&hl=en ```json { "search_metadata": { "id": "6450d2ea2236e2171ef9ea48", "status": "Success", "json_endpoint": "https://serpapi.com/searches/3b86fc7132759833/6450d2ea2236e2171ef9ea48.json", "created_at": "2023-05-02 09:07:54 UTC", "processed_at": "2023-05-02 09:07:54 UTC", "google_scholar_url": "https://scholar.google.com/scholar?q=Coffee&hl=en", "raw_html_file": "https://serpapi.com/searches/3b86fc7132759833/6450d2ea2236e2171ef9ea48.html", "total_time_taken": 0.82 }, "search_parameters": { "engine": "google_scholar", "q": "Coffee", "hl": "en" }, "search_information": { "organic_results_state": "Results for exact spelling", "total_results": 3790000, "time_taken_displayed": 0.03, "query_displayed": "Coffee" }, "organic_results": [ { "position": 0, "title": "The impact of coffee on health", "result_id": "sWzmct-yYzgJ", "link": "https://www.sciencedirect.com/science/article/pii/S0378512213000479", "snippet": "… , where pressed ground coffee beans are used directly, in either espresso or coffee pot so … espresso or coffee pots, where boiled water is directly passed through pressed coffee powder …", "publication_info": { "summary": "A Cano-Marquina, JJ TarΓ­n, A Cano - Maturitas, 2013 - Elsevier", "authors": [ { "name": "A Cano-Marquina", "link": "https://scholar.google.com/citations?user=nMS5yswAAAAJ&hl=en&oi=sra", "serpapi_scholar_link": "https://serpapi.com/search.json?author_id=nMS5yswAAAAJ&engine=google_scholar_author&hl=en", "author_id": "nMS5yswAAAAJ" } ] }, "inline_links": { "serpapi_cite_link": "https://serpapi.com/search.json?engine=google_scholar_cite&q=sWzmct-yYzgJ", "cited_by": { "total": 443, "link": "https://scholar.google.com/scholar?cites=4063287961593474225&as_sdt=5,43&sciodt=0,43&hl=en", "cites_id": "4063287961593474225", "serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=5%2C43&cites=4063287961593474225&engine=google_scholar&hl=en" }, "related_pages_link": "https://scholar.google.com/scholar?q=related:sWzmct-yYzgJ:scholar.google.com/&scioq=Coffee&hl=en&as_sdt=0,43", "serpapi_related_pages_link": "https://serpapi.com/search.json?as_sdt=0%2C43&engine=google_scholar&hl=en&q=related%3AsWzmct-yYzgJ%3Ascholar.google.com%2F", "versions": { "total": 8, "link": "https://scholar.google.com/scholar?cluster=4063287961593474225&hl=en&as_sdt=0,43", "cluster_id": "4063287961593474225", "serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=0%2C43&cluster=4063287961593474225&engine=google_scholar&hl=en" } } }, { "position": 1, "title": "Functional properties of coffee and coffee by-products", "result_id": "9WouRiFbIK4J", "link": "https://www.sciencedirect.com/science/article/pii/S0963996911003449", "snippet": "… roasted coffee … coffee, coffee beans and by-products in terms of the associated potential health benefits. The data in this review have been organized in sections according to the coffee …", "publication_info": { "summary": "P Esquivel, VM Jimenez - Food research international, 2012 - Elsevier", "authors": [ { "name": "P Esquivel", "link": "https://scholar.google.com/citations?user=EpwJXskAAAAJ&hl=en&oi=sra", "serpapi_scholar_link": "https://serpapi.com/search.json?author_id=EpwJXskAAAAJ&engine=google_scholar_author&hl=en", "author_id": "EpwJXskAAAAJ" }, { "name": "VM Jimenez", "link": "https://scholar.google.com/citations?user=_P0h0B8AAAAJ&hl=en&oi=sra", "serpapi_scholar_link": "https://serpapi.com/search.json?author_id=_P0h0B8AAAAJ&engine=google_scholar_author&hl=en", "author_id": "_P0h0B8AAAAJ" } ] }, "resources": [ { "title": "uoregon.edu", "file_format": "PDF", "link": "https://pages.uoregon.edu/chendon/coffee_literature/2012%20Food%20Res.%20Int.,%20Uses%20for%20coffee%20waste.pdf" } ], "inline_links": { "serpapi_cite_link": "https://serpapi.com/search.json?engine=google_scholar_cite&q=9WouRiFbIK4J", "cited_by": { "total": 1004, "link": "https://scholar.google.com/scholar?cites=12547128760323697397&as_sdt=5,43&sciodt=0,43&hl=en", "cites_id": "12547128760323697397", "serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=5%2C43&cites=12547128760323697397&engine=google_scholar&hl=en" }, "related_pages_link": "https://scholar.google.com/scholar?q=related:9WouRiFbIK4J:scholar.google.com/&scioq=Coffee&hl=en&as_sdt=0,43", "serpapi_related_pages_link": "https://serpapi.com/search.json?as_sdt=0%2C43&engine=google_scholar&hl=en&q=related%3A9WouRiFbIK4J%3Ascholar.google.com%2F", "versions": { "total": 7, "link": "https://scholar.google.com/scholar?cluster=12547128760323697397&hl=en&as_sdt=0,43", "cluster_id": "12547128760323697397", "serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=0%2C43&cluster=12547128760323697397&engine=google_scholar&hl=en" } } }, { "position": 2, "title": "Coffee technology", "result_id": "-0fOFoq7wJ8J", "type": "Book", "link": "https://agris.fao.org/agris-search/search.do?recordID=US8224181", "snippet": "… development of coffee and its uses; green, roast, and instant coffee technologies; and coffee and … aspects of coffee; physiological effects of coffee and caffeine; brewing technology). (wz) …", "publication_info": { "summary": "M Sivetz, NW Desrosier - 1979 - agris.fao.org" }, "inline_links": { "serpapi_cite_link": "https://serpapi.com/search.json?engine=google_scholar_cite&q=-0fOFoq7wJ8J", "cited_by": { "total": 398, "link": "https://scholar.google.com/scholar?cites=11511406849321486331&as_sdt=5,43&sciodt=0,43&hl=en", "cites_id": "11511406849321486331", "serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=5%2C43&cites=11511406849321486331&engine=google_scholar&hl=en" }, "related_pages_link": "https://scholar.google.com/scholar?q=related:-0fOFoq7wJ8J:scholar.google.com/&scioq=Coffee&hl=en&as_sdt=0,43", "serpapi_related_pages_link": "https://serpapi.com/search.json?as_sdt=0%2C43&engine=google_scholar&hl=en&q=related%3A-0fOFoq7wJ8J%3Ascholar.google.com%2F", "versions": { "total": 4, "link": "https://scholar.google.com/scholar?cluster=11511406849321486331&hl=en&as_sdt=0,43", "cluster_id": "11511406849321486331", "serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=0%2C43&cluster=11511406849321486331&engine=google_scholar&hl=en" }, "cached_page_link": "https://scholar.googleusercontent.com/scholar?q=cache:-0fOFoq7wJ8J:scholar.google.com/+Coffee&hl=en&as_sdt=0,43" } }, { "position": 3, "title": "All about coffee", "result_id": "fGeQlvu-2_IJ", "type": "Book", "link": "https://books.google.com/books?hl=en&lr=&id=oJxpQX4ko7cC&oi=fnd&pg=PT1&dq=Coffee&ots=OjhYxY4cZX&sig=aV_mShXj96QUd6t4FK1kUdkp-wI", "snippet": "… S EVENTEEN years ago the author of this work made his first trip abroad to gather material for a book on coffee. Subsequently he spent a year in travel among the coffee-producing …", "publication_info": { "summary": "WH Ukers - 1935 - books.google.com" }, "inline_links": { "serpapi_cite_link": "https://serpapi.com/search.json?engine=google_scholar_cite&q=fGeQlvu-2_IJ", "cited_by": { "total": 487, "link": "https://scholar.google.com/scholar?cites=17499790764850308988&as_sdt=5,43&sciodt=0,43&hl=en", "cites_id": "17499790764850308988", "serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=5%2C43&cites=17499790764850308988&engine=google_scholar&hl=en" }, "related_pages_link": "https://scholar.google.com/scholar?q=related:fGeQlvu-2_IJ:scholar.google.com/&scioq=Coffee&hl=en&as_sdt=0,43", "serpapi_related_pages_link": "https://serpapi.com/search.json?as_sdt=0%2C43&engine=google_scholar&hl=en&q=related%3AfGeQlvu-2_IJ%3Ascholar.google.com%2F", "versions": { "total": 7, "link": "https://scholar.google.com/scholar?cluster=17499790764850308988&hl=en&as_sdt=0,43", "cluster_id": "17499790764850308988", "serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=0%2C43&cluster=17499790764850308988&engine=google_scholar&hl=en" } } }, { "position": 4, "title": "Espresso coffee: the science of quality", "result_id": "CZSAb_VNDkkJ", "type": "Book", "link": "https://books.google.com/books?hl=en&lr=&id=AJdlfSFCmVIC&oi=fnd&pg=PP2&dq=Coffee&ots=mm8oZ8CZjQ&sig=8TA6fSdZKCJiCY55u7l5aIaOKJ8", "snippet": "Written by leading coffee technology specialists in consultation with some of the world's biggest coffee manufacturers, the second edition of the successful Espresso Coffee will once …", "publication_info": { "summary": "A Illy, R Viani - 2005 - books.google.com" }, "inline_links": { "serpapi_cite_link": "https://serpapi.com/search.json?engine=google_scholar_cite&q=CZSAb_VNDkkJ", "cited_by": { "total": 432, "link": "https://scholar.google.com/scholar?cites=5264230730975712265&as_sdt=5,43&sciodt=0,43&hl=en", "cites_id": "5264230730975712265", "serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=5%2C43&cites=5264230730975712265&engine=google_scholar&hl=en" }, "related_pages_link": "https://scholar.google.com/scholar?q=related:CZSAb_VNDkkJ:scholar.google.com/&scioq=Coffee&hl=en&as_sdt=0,43", "serpapi_related_pages_link": "https://serpapi.com/search.json?as_sdt=0%2C43&engine=google_scholar&hl=en&q=related%3ACZSAb_VNDkkJ%3Ascholar.google.com%2F", "versions": { "total": 3, "link": "https://scholar.google.com/scholar?cluster=5264230730975712265&hl=en&as_sdt=0,43", "cluster_id": "5264230730975712265", "serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=0%2C43&cluster=5264230730975712265&engine=google_scholar&hl=en" } } }, { "position": 5, "title": "Coffee: Volume 2: Technology", "result_id": "Jt15QwxlEw0J", "type": "Book", "link": "https://books.google.com/books?hl=en&lr=&id=NlcGCAAAQBAJ&oi=fnd&pg=PR14&dq=Coffee&ots=udC1nyFfNl&sig=zkd5mdNw6ZOKlC0W5j99KjWW0qc", "snippet": "… The choice of green coffees from an almost … the coffee cherry to the green coffee bean, needs understanding and guidance. Furthermore, various forms of pre-treatment of green coffee …", "publication_info": { "summary": "RJ Clarke - 2012 - books.google.com" }, "inline_links": { "serpapi_cite_link": "https://serpapi.com/search.json?engine=google_scholar_cite&q=Jt15QwxlEw0J", "cited_by": { "total": 322, "link": "https://scholar.google.com/scholar?cites=942207850396638502&as_sdt=5,43&sciodt=0,43&hl=en", "cites_id": "942207850396638502", "serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=5%2C43&cites=942207850396638502&engine=google_scholar&hl=en" }, "related_pages_link": "https://scholar.google.com/scholar?q=related:Jt15QwxlEw0J:scholar.google.com/&scioq=Coffee&hl=en&as_sdt=0,43", "serpapi_related_pages_link": "https://serpapi.com/search.json?as_sdt=0%2C43&engine=google_scholar&hl=en&q=related%3AJt15QwxlEw0J%3Ascholar.google.com%2F", "versions": { "total": 5, "link": "https://scholar.google.com/scholar?cluster=942207850396638502&hl=en&as_sdt=0,43", "cluster_id": "942207850396638502", "serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=0%2C43&cluster=942207850396638502&engine=google_scholar&hl=en" } } }, { "position": 6, "title": "The impact of coffee on health", "result_id": "KVT-hW9IrDoJ", "type": "Html", "link": "https://www.thieme-connect.com/products/ejournals/html/10.1055/s-0043-115007", "snippet": "… coffee, indicating that besides caffeine other components contribute to the health protecting effects. For adults consuming moderate amounts of coffee (… information about coffee on health…", "publication_info": { "summary": "K Nieber - Planta medica, 2017 - thieme-connect.com" }, "resources": [ { "title": "thieme-connect.com", "file_format": "HTML", "link": "https://www.thieme-connect.com/products/ejournals/html/10.1055/s-0043-115007" } ], "inline_links": { "serpapi_cite_link": "https://serpapi.com/search.json?engine=google_scholar_cite&q=KVT-hW9IrDoJ", "html_version": "https://www.thieme-connect.com/products/ejournals/html/10.1055/s-0043-115007", "cited_by": { "total": 188, "link": "https://scholar.google.com/scholar?cites=4227833794020660265&as_sdt=5,43&sciodt=0,43&hl=en", "cites_id": "4227833794020660265", "serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=5%2C43&cites=4227833794020660265&engine=google_scholar&hl=en" }, "related_pages_link": "https://scholar.google.com/scholar?q=related:KVT-hW9IrDoJ:scholar.google.com/&scioq=Coffee&hl=en&as_sdt=0,43", "serpapi_related_pages_link": "https://serpapi.com/search.json?as_sdt=0%2C43&engine=google_scholar&hl=en&q=related%3AKVT-hW9IrDoJ%3Ascholar.google.com%2F", "versions": { "total": 7, "link": "https://scholar.google.com/scholar?cluster=4227833794020660265&hl=en&as_sdt=0,43", "cluster_id": "4227833794020660265", "serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=0%2C43&cluster=4227833794020660265&engine=google_scholar&hl=en" } } }, { "position": 7, "title": "Coffee: recent developments", "result_id": "31GOrHWBl_AJ", "type": "Book", "link": "https://books.google.com/books?hl=en&lr=&id=jIFY_Pz8LH0C&oi=fnd&pg=PR3&dq=Coffee&ots=OCMwRhmx-G&sig=zlt7LhnnOcNOXW_Q7qWW_bprw6c", "snippet": "… The physiological effects of coffee drinking are considered in a fascinating chapter on coffee and health. Agronomic aspects of coffee breeding and growing are covered specifically in …", "publication_info": { "summary": "R Clarke, OG Vitzthum - 2008 - books.google.com", "authors": [ { "name": "R Clarke", "link": "https://scholar.google.com/citations?user=emScaM4AAAAJ&hl=en&oi=sra", "serpapi_scholar_link": "https://serpapi.com/search.json?author_id=emScaM4AAAAJ&engine=google_scholar_author&hl=en", "author_id": "emScaM4AAAAJ" } ] }, "inline_links": { "serpapi_cite_link": "https://serpapi.com/search.json?engine=google_scholar_cite&q=31GOrHWBl_AJ", "cited_by": { "total": 291, "link": "https://scholar.google.com/scholar?cites=17336467632992178655&as_sdt=5,43&sciodt=0,43&hl=en", "cites_id": "17336467632992178655", "serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=5%2C43&cites=17336467632992178655&engine=google_scholar&hl=en" }, "related_pages_link": "https://scholar.google.com/scholar?q=related:31GOrHWBl_AJ:scholar.google.com/&scioq=Coffee&hl=en&as_sdt=0,43", "serpapi_related_pages_link": "https://serpapi.com/search.json?as_sdt=0%2C43&engine=google_scholar&hl=en&q=related%3A31GOrHWBl_AJ%3Ascholar.google.com%2F", "versions": { "total": 4, "link": "https://scholar.google.com/scholar?cluster=17336467632992178655&hl=en&as_sdt=0,43", "cluster_id": "17336467632992178655", "serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=0%2C43&cluster=17336467632992178655&engine=google_scholar&hl=en" } } }, { "position": 8, "title": "Components of coffee", "result_id": "QwF9cuvhnCoJ", "type": "Pdf", "link": "https://www.researchgate.net/profile/Maurice-Arnaud/publication/311253694_Metabolism_of_caffeine_and_other_components_of_coffee/links/58f1d0cfa6fdcc11e569e8e8/Metabolism-of-caffeine-and-other-components-of-coffee.pdf", "snippet": "The metabolism of coffee constituents such as trigonelline, chlorogenic acid with its two components, caffeic and quinic acids, are also presented. Other aromatic compounds present in …", "publication_info": { "summary": "MJ Arnaud - Caffeine, coffee, and health, 1993 - researchgate.net" }, "resources": [ { "title": "researchgate.net", "file_format": "PDF", "link": "https://www.researchgate.net/profile/Maurice-Arnaud/publication/311253694_Metabolism_of_caffeine_and_other_components_of_coffee/links/58f1d0cfa6fdcc11e569e8e8/Metabolism-of-caffeine-and-other-components-of-coffee.pdf" } ], "inline_links": { "serpapi_cite_link": "https://serpapi.com/search.json?engine=google_scholar_cite&q=QwF9cuvhnCoJ", "cited_by": { "total": 224, "link": "https://scholar.google.com/scholar?cites=3070577447314194755&as_sdt=5,43&sciodt=0,43&hl=en", "cites_id": "3070577447314194755", "serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=5%2C43&cites=3070577447314194755&engine=google_scholar&hl=en" }, "related_pages_link": "https://scholar.google.com/scholar?q=related:QwF9cuvhnCoJ:scholar.google.com/&scioq=Coffee&hl=en&as_sdt=0,43", "serpapi_related_pages_link": "https://serpapi.com/search.json?as_sdt=0%2C43&engine=google_scholar&hl=en&q=related%3AQwF9cuvhnCoJ%3Ascholar.google.com%2F", "versions": { "total": 3, "link": "https://scholar.google.com/scholar?cluster=3070577447314194755&hl=en&as_sdt=0,43", "cluster_id": "3070577447314194755", "serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=0%2C43&cluster=3070577447314194755&engine=google_scholar&hl=en" } } }, { "position": 9, "title": "Coffee", "result_id": "F6FR6YN9nBsJ", "type": "Book", "link": "https://link.springer.com/chapter/10.1007/978-3-030-24733-1_23", "snippet": "… with coffee plants. Coffee crop management plays an important role in the settlement and maintenance of their populations. This chapter covers aspects related to the main coffee crop …", "publication_info": { "summary": "CF Carvalho, SM Carvalho, B Souza - 2019 - Springer", "authors": [ { "name": "SM Carvalho", "link": "https://scholar.google.com/citations?user=70EjKtAAAAAJ&hl=en&oi=sra", "serpapi_scholar_link": "https://serpapi.com/search.json?author_id=70EjKtAAAAAJ&engine=google_scholar_author&hl=en", "author_id": "70EjKtAAAAAJ" }, { "name": "B Souza", "link": "https://scholar.google.com/citations?user=GFIwzIsAAAAJ&hl=en&oi=sra", "serpapi_scholar_link": "https://serpapi.com/search.json?author_id=GFIwzIsAAAAJ&engine=google_scholar_author&hl=en", "author_id": "GFIwzIsAAAAJ" } ] }, "inline_links": { "serpapi_cite_link": "https://serpapi.com/search.json?engine=google_scholar_cite&q=F6FR6YN9nBsJ", "cited_by": { "total": 84, "link": "https://scholar.google.com/scholar?cites=1989603140899545367&as_sdt=5,43&sciodt=0,43&hl=en", "cites_id": "1989603140899545367", "serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=5%2C43&cites=1989603140899545367&engine=google_scholar&hl=en" }, "related_pages_link": "https://scholar.google.com/scholar?q=related:F6FR6YN9nBsJ:scholar.google.com/&scioq=Coffee&hl=en&as_sdt=0,43", "serpapi_related_pages_link": "https://serpapi.com/search.json?as_sdt=0%2C43&engine=google_scholar&hl=en&q=related%3AF6FR6YN9nBsJ%3Ascholar.google.com%2F", "versions": { "total": 4, "link": "https://scholar.google.com/scholar?cluster=1989603140899545367&hl=en&as_sdt=0,43", "cluster_id": "1989603140899545367", "serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=0%2C43&cluster=1989603140899545367&engine=google_scholar&hl=en" } } } ], "related_searches": [ { "query": "coffee consumption", "link": "https://scholar.google.com/scholar?hl=en&as_sdt=0,43&qsp=1&q=coffee+consumption&qst=ib" }, { "query": "coffee shop", "link": "https://scholar.google.com/scholar?hl=en&as_sdt=0,43&qsp=2&q=coffee+shop&qst=ib" }, { "query": "coffee caffeine", "link": "https://scholar.google.com/scholar?hl=en&as_sdt=0,43&qsp=3&q=coffee+caffeine&qst=ib" }, { "query": "coffee grounds", "link": "https://scholar.google.com/scholar?hl=en&as_sdt=0,43&qsp=4&q=coffee+grounds&qst=ib" }, { "query": "coffee bean", "link": "https://scholar.google.com/scholar?hl=en&as_sdt=0,43&qsp=5&q=coffee+bean&qst=ib" }, { "query": "starbucks coffee", "link": "https://scholar.google.com/scholar?hl=en&as_sdt=0,43&qsp=6&q=starbucks+coffee&qst=ib" }, { "query": "coffee arabica", "link": "https://scholar.google.com/scholar?hl=en&as_sdt=0,43&qsp=7&q=coffee+arabica&qst=ib" }, { "query": "fair trade coffee", "link": "https://scholar.google.com/scholar?hl=en&as_sdt=0,43&qsp=8&q=fair+trade+coffee&qst=ib" } ], "pagination": { "current": 1, "next": "https://scholar.google.com/scholar?start=10&q=Coffee&hl=en&as_sdt=0,43", "other_pages": { "2": "https://scholar.google.com/scholar?start=10&q=Coffee&hl=en&as_sdt=0,43", "3": "https://scholar.google.com/scholar?start=20&q=Coffee&hl=en&as_sdt=0,43", "4": "https://scholar.google.com/scholar?start=30&q=Coffee&hl=en&as_sdt=0,43", "5": "https://scholar.google.com/scholar?start=40&q=Coffee&hl=en&as_sdt=0,43", "6": "https://scholar.google.com/scholar?start=50&q=Coffee&hl=en&as_sdt=0,43", "7": "https://scholar.google.com/scholar?start=60&q=Coffee&hl=en&as_sdt=0,43", "8": "https://scholar.google.com/scholar?start=70&q=Coffee&hl=en&as_sdt=0,43", "9": "https://scholar.google.com/scholar?start=80&q=Coffee&hl=en&as_sdt=0,43", "10": "https://scholar.google.com/scholar?start=90&q=Coffee&hl=en&as_sdt=0,43" } }, "serpapi_pagination": { "current": 1, "next_link": "https://serpapi.com/search.json?as_sdt=0%2C43&engine=google_scholar&hl=en&q=Coffee&start=10", "next": "https://serpapi.com/search.json?as_sdt=0%2C43&engine=google_scholar&hl=en&q=Coffee&start=10", "other_pages": { "2": "https://serpapi.com/search.json?as_sdt=0%2C43&engine=google_scholar&hl=en&q=Coffee&start=10", "3": "https://serpapi.com/search.json?as_sdt=0%2C43&engine=google_scholar&hl=en&q=Coffee&start=20", "4": "https://serpapi.com/search.json?as_sdt=0%2C43&engine=google_scholar&hl=en&q=Coffee&start=30", "5": "https://serpapi.com/search.json?as_sdt=0%2C43&engine=google_scholar&hl=en&q=Coffee&start=40", "6": "https://serpapi.com/search.json?as_sdt=0%2C43&engine=google_scholar&hl=en&q=Coffee&start=50", "7": "https://serpapi.com/search.json?as_sdt=0%2C43&engine=google_scholar&hl=en&q=Coffee&start=60", "8": "https://serpapi.com/search.json?as_sdt=0%2C43&engine=google_scholar&hl=en&q=Coffee&start=70", "9": "https://serpapi.com/search.json?as_sdt=0%2C43&engine=google_scholar&hl=en&q=Coffee&start=80", "10": "https://serpapi.com/search.json?as_sdt=0%2C43&engine=google_scholar&hl=en&q=Coffee&start=90" } } } ```

However, it only extracts first 3 points below". But I don't understand what you mean with that last sentence about 3 points.

"below" is a typo. I meant "above" πŸ™‚ Visual example of what I meant (I'll update README, thank you πŸ˜€):

image

So you actually need to launch several additional requests to get publication details for each of the 20 papers returned in "one request". I guess the same constraints apply using serpapi. Correct?

Yes as every package (including third-party APIs) out there needs to make additional requests to Google which in return get data to you. In other words, if no additional requests are being made, there's no data to extract from. Not sure how to explain it better for now πŸ™‚

Just to clarify what you've already written but in a visual form:

image

How many requests should be done

Calculated without "link" JSON key as it leads to an outside website (which SerpApi can't scrape) so 5 additional requests need to be done.

For 1 page with 20 results:

For 10 pages with 20 results per page:

For 20 pages with 20 results per page:

I think these are accurate calculations, I'm not really good at math though πŸ™‚

Is this correct or would scrape-google-scholar-py / serpapi somehow include more detailed info in the original GS request, so we might not need to launch additional requests?

Not 100% sure what you meant πŸ™‚

Both SerpApi and scrape-google-scholar-py (and scholary or other modules) extract 1:1 information that Google Scholar shows.

abubelinha commented 1 year ago

Perfectly explained and understood. I'll probably be back to this in a few months, but no more questions for now. Thanks a lot !!

dimitryzub commented 1 year ago

I'll set a reminder for myself and for you. It will tag us in this thread on August 1. This is also to close the issue if it will be inactive for too long πŸ™‚

@set-reminder August 1 5am @abubelinha if have time, could you please have another look at this issue and try running on Python 3.10+ and see if you have the same error: ImportError: cannot import name 'SerpApiGoogleScholarOrganic' from 'google_scholar_py'? Thank you.

octo-reminder[bot] commented 1 year ago

⏰ Reminder Tuesday, August 1, 2023 5:00 AM (GMT+01:00)

@abubelinha if have time, could you please have another look at this issue and try running on Python 3.10+ and see if you have the same error: ImportError: cannot import name 'SerpApiGoogleScholarOrganic' from 'google_scholar_py'? Thank you.

abubelinha commented 1 year ago

OK, don't worry. I expect not to be around by that time (holidays?). Whenever I have my OS upgraded I will come back to you if I can start trying again this package (probably not before Christmas because I use to be too busy at work until the end of autumn)

octo-reminder[bot] commented 1 year ago

πŸ”” @dimitryzub

@abubelinha if have time, could you please have another look at this issue and try running on Python 3.10+ and see if you have the same error: ImportError: cannot import name 'SerpApiGoogleScholarOrganic' from 'google_scholar_py'? Thank you.