dimitryzub / scrape-google-scholar-py

Extract data from all Google Scholar pages from a single Python module. NOTE: I'm no longer maintaining this repo. Chrome driver/selectors might need and update.
MIT License
82 stars 16 forks source link

Google Scholar advanced search examples #9

Closed abubelinha closed 1 year ago

abubelinha commented 1 year ago

I cannot try this myself yet (see #8).

I hope it is an easy question to answer. Would this package permit to run an advanced search like when you use the "advanced search" in Google Scholar web interface?

Find articles

That produces this text in the top textbox: zoology aves "New York" OR Paris "Natural History Museum" (as well as limit years in the produced GS url).

It would be nice seeing an example (here or in documentation) of how to do the same with this package.

Thanks a lot in advance @abubelinha

yudeng2022 commented 1 year ago

I have the exact same question :)

dimitryzub commented 1 year ago

@abubelinha @yudeng2022 thank you guys for helping to improve this module.

I'll come back to this question in a couple of days and tag both of you. Thank you for your interest 👍🙂

@abubelinha if possible, could you provide an actual Google Scholar example link with applied filters? It would help (speed up the process) a lot.

abubelinha commented 1 year ago

Sure. I edited my example above to include the url link there too. I was confused about how GS was doing for not mixing AND / OR statements (see above in the produced interface text box there are NO visible parenthesis around the "New York" OR Paris part, as I would expect). But it looks like the OR stuff is put apart inside the as_oq parameter of the url:


abubelinha commented 1 year ago

BTW this is more a GS question, but I am not sure if "New York" may be accounted as a word. Do you know if the example approach is correct? I want to search for publications which are related to some different "Natural History Museum"s (so I use a phrase) But I want to filter in just those publications citing "Paris" or "New York" museums (so I use the any of the words options)

The point is: how to put "New York", "Los Angeles", "Rio de Janeiro" ... so they can be accounted as words? Do you know if the double quotes approach is correct, or perhaps this is simply impossible to control when searching GS?

dimitryzub commented 1 year ago

Sure. I edited my example above to include the url link there too. I was confused about how GS was doing for not mixing AND / OR statements (see above in the produced interface text box there are NO visible parenthesis around the "New York" OR Paris part, as I would expect). But it looks like the OR stuff is put apart inside the as_oq parameter of the url:


@abubelinha @yudeng2022

I think OR needs to be in parenthesis (not 100% sure though): zoology aves "Natural History Museum" ("New York" OR "Paris"). In any case, this and the URL provided by you return on my end the same number of articles (with other parameters applied). URL.

Code example for your request (using filtered URL from Google Scholar):

from google_scholar_py import CustomGoogleScholarOrganic
import json 

parser = CustomGoogleScholarOrganic()
data = parser.scrape_google_scholar_organic_results(
    pagination=False, # set to True to go through all possible pages

print(json.dumps(data, indent=2, ensure_ascii=False))

Output from the first page:

    "title": "[HTML][HTML] Repositories for mite and tick specimens: acronyms and their nomenclature",
    "title_link": "https://bioone.org/journals/Systematic-and-Applied-Acarology/volume-23/issue-12/saa.23.12.12/Repositories-for-mite-and-tick-specimens--acronyms-and-their/10.11158/saa.23.12.12.full",
    "publication_info": "ZQ Zhang - Systematic and Applied Acarology, 2018 - BioOne",
    "snippet": "… ) in London is now known as The Natural History Museum, London and some authors used \n… whereas BM is also used by botanists (not zoologists) for British Museum (Natural History) in …",
    "cited_by_link": "https://scholar.google.com/scholar?cites=15722832665772663017&as_sdt=2005&sciodt=0,5&hl=en",
    "cited_by_count": 179,
    "pdf_file": "https://bioone.org/journals/Systematic-and-Applied-Acarology/volume-23/issue-12/saa.23.12.12/Repositories-for-mite-and-tick-specimens--acronyms-and-their/10.11158/saa.23.12.12.full"
    "title": "A space of one's own: Barbosa du Bocage, the foundation of the National Museum of Lisbon, and the construction of a career in zoology (1851–1907)",
    "title_link": "https://link.springer.com/article/10.1007/s10739-017-9487-6",
    "publication_info": "D Gamito-Marques - Journal of the History of Biology, 2018 - Springer",
    "snippet": "… Bocage transferred a natural history museum formerly located at … a more prestigious place \nfor zoology. Although successive … the Zoological Section taking as a model the Paris Museum. …",
    "cited_by_link": "https://scholar.google.com/scholar?cites=13368754249756812814&as_sdt=2005&sciodt=0,5&hl=en",
    "cited_by_count": 6,
    "pdf_file": null
    "publication_info": "A PETRESCU, MS RIDICHE, AM PETRESCU - researchgate.net",
    "snippet": "… and Public Instructions announced the Zoology Museum that a … of the Catholic Institute \nMuseum from Paris. ‹Parmi … deputy director of the Natural History Museum from Paris. For many …",
    "cited_by_link": null,
    "cited_by_count": null,
    "title": "Comprehensive phylogeny of the laughingthrushes and allies (Aves, Leiothrichidae) and a proposal for a revised taxonomy",
    "title_link": "https://onlinelibrary.wiley.com/doi/abs/10.1111/zsc.12296",
    "publication_info": "A Cibois, M Gelang, P Alström, E Pasquet… - Zoologica …, 2018 - Wiley Online Library",
    "snippet": "DNA phylogenies have gradually shed light on the phylogenetic relationships of the large \nbabbler group. We focus in this study on the family Leiothrichidae (laughingthrushes and “…",
    "cited_by_link": "https://scholar.google.com/scholar?cites=13244702554796976053&as_sdt=2005&sciodt=0,5&hl=en",
    "cited_by_count": 19,
    "pdf_file": "http://macroecointern.dk/pdf-reprints/Cibois_ZS_2018.pdf"
    "title": "Case 3754–Circus assimilis Jardine & Selby, 1828 and Circus approximans Peale, 1848 (Aves, Accipitriformes): conservation of usage by designation of a neotype for …",
    "title_link": "https://bioone.org/journals/The-Bulletin-of-Zoological-Nomenclature/volume-75/issue-1/bzn.v75.a044/Case-3754--Circus-assimilis-Jardine--Selby-1828-and/10.21805/bzn.v75.a044.short",
    "publication_info": "SJS Debus, IAW McAllan, R Schodde - The Bulletin of Zoological …, 2018 - BioOne",
    "snippet": "… , which maintains very large zoological research collections, … a neotype in the Natural History \nMuseum, Tring, England, … leading Australian governmentsupported zoological collection, of …",
    "cited_by_link": "https://scholar.google.com/scholar?cites=10790008245466086412&as_sdt=2005&sciodt=0,5&hl=en",
    "cited_by_count": 1,
    "pdf_file": "https://www.biotaxa.org/bzn/article/view/44158/38032"
    "title": "Opinion 2383 (Case 3640) — Touit GR Gray, 1855 and Prosopeia Bonaparte, 1854 (Aves, psittacidae): usage of names conserved",
    "title_link": "https://bioone.org/journals/the-bulletin-of-zoological-nomenclature/volume-73/issue-2-4/bzn.v73i2.a22/Opinion-2383-Case-3640--Touit-GR-Gray-1855-and/10.21805/bzn.v73i2.a22.short",
    "publication_info": "International Commission on Zoological … - … Bulletin of Zoological …, 2017 - BioOne",
    "snippet": "… Under the plenary power the International Commission on Zoological Nomenclature has \nruled to maintain current usage of the names Touit GR Gray, 1855 and Prosopeia Bonaparte, …",
    "cited_by_link": null,
    "cited_by_count": null,
    "pdf_file": "https://www.biotaxa.org/bzn/article/download/37757/32425"
    "title": "[HTML][HTML] The Palaearctic types of Chrysididae (Insecta, Hymenoptera) deposited in the Hungarian Natural History Museum, Budapest",
    "title_link": "https://www.mapress.com/zt/article/view/zootaxa.4252.1.1",
    "publication_info": "P Rosa, Z Vas, ZF Xu - Zootaxa, 2017 - mapress.com",
    "snippet": "A critical and annotated catalogue of the Palaearctic types of chrysidid species, subspecies \nand varieties deposited in the Magyar Természettudományi Múzeum is given. The lectotype …",
    "cited_by_link": "https://scholar.google.com/scholar?cites=15881474893913524708&as_sdt=2005&sciodt=0,5&hl=en",
    "cited_by_count": 19,
    "pdf_file": "https://www.mapress.com/zt/article/view/zootaxa.4252.1.1"
    "title": "[HTML][HTML] Morphological evidence for the taxonomic status of the Bridge's Guan, Penelope bridgesi, with comments on the validity of P. obscura bronzina (Aves …",
    "title_link": "https://www.scielo.br/j/zool/a/XPZfgj3bjLX4Nx9BqCQHdxF/?format=html&lang=en",
    "publication_info": "OD Evangelista-Vargas, LF Silveira - Zoologia (Curitiba), 2018 - SciELO Brasil",
    "snippet": "… obscura bronzina</i></i> (Aves: Cracidae) Morphological evidence for the taxonomic \nstatus of the Bridge’s Guan, <i>Penelope bridgesi</i>, with comments on the validity of <i><i>P. …",
    "cited_by_link": "https://scholar.google.com/scholar?cites=2753278180435218647&as_sdt=2005&sciodt=0,5&hl=en",
    "cited_by_count": 4,
    "pdf_file": "https://www.scielo.br/j/zool/a/XPZfgj3bjLX4Nx9BqCQHdxF/?format=html&lang=en"
    "title": "[BOOK][B] The future of natural history museums",
    "title_link": "https://books.google.com/books?hl=en&lr=&id=MqU5DwAAQBAJ&oi=fnd&pg=PT23&dq=zoology+aves+%22Natural+History+Museum%22+(%22New+York%22+OR+%22Paris%22)&ots=NmomaRUgvY&sig=SbVLQPibGNu_6jnO_paKiJ1D1aA",
    "publication_info": "E Dorfman - 2017 - books.google.com",
    "snippet": "… of the trajectory of the natural history museum sector in the next 20 … a newspaper reporter at \nNewsday in New York. She earned her … She has experience in Zoology with an emphasis in …",
    "cited_by_link": "https://scholar.google.com/scholar?cites=12024103019083755514&as_sdt=2005&sciodt=0,5&hl=en",
    "cited_by_count": 12,
    "pdf_file": null
    "title": "[BOOK][B] Museums in motion: An introduction to the history and functions of museums",
    "title_link": "https://books.google.com/books?hl=en&lr=&id=iw4TDgAAQBAJ&oi=fnd&pg=PR5&dq=zoology+aves+%22Natural+History+Museum%22+(%22New+York%22+OR+%22Paris%22)&ots=DEZplQY2ph&sig=uFPXKO3seuTFkxQ-raYct44y2_Q",
    "publication_info": "EP Alexander, M Alexander, J Decker - 2017 - books.google.com",
    "snippet": "… hides, and a botanical and zoological park, but it was chiefly a … studies, and botanical and \nzoological gardens.16 Bearing in 
… and had branches in Baltimore and New York. He mounted …",
    "cited_by_link": "https://scholar.google.com/scholar?cites=14537271977882170055&as_sdt=2005&sciodt=0,5&hl=en",
    "cited_by_count": 1420,
    "pdf_file": null

Results from outputted JSON file:

[{"title":"[HTML][HTML] Repositories for mite and tick specimens: acronyms and their nomenclature","title_link":"https:\/\/bioone.org\/journals\/Systematic-and-Applied-Acarology\/volume-23\/issue-12\/saa.23.12.12\/Repositories-for-mite-and-tick-specimens--acronyms-and-their\/10.11158\/saa.23.12.12.full","publication_info":"ZQ Zhang\u00a0- Systematic and Applied Acarology, 2018 - BioOne","snippet":"\u2026 ) in London is now known as The Natural History Museum, London and some authors used \n\u2026 whereas BM is also used by botanists (not zoologists) for British Museum (Natural History) in \u2026","cited_by_link":"https:\/\/scholar.google.com\/scholar?cites=15722832665772663017&as_sdt=2005&sciodt=0,5&hl=en","cited_by_count":179.0,"pdf_file":"https:\/\/bioone.org\/journals\/Systematic-and-Applied-Acarology\/volume-23\/issue-12\/saa.23.12.12\/Repositories-for-mite-and-tick-specimens--acronyms-and-their\/10.11158\/saa.23.12.12.full"},{"title":"A space of one's own: Barbosa du Bocage, the foundation of the National Museum of Lisbon, and the construction of a career in zoology (1851\u20131907)","title_link":"https:\/\/link.springer.com\/article\/10.1007\/s10739-017-9487-6","publication_info":"D Gamito-Marques\u00a0- Journal of the History of Biology, 2018 - Springer","snippet":"\u2026 Bocage transferred a natural history museum formerly located at \u2026 a more prestigious place \nfor zoology. Although successive \u2026 the Zoological Section taking as a model the Paris Museum. \u2026","cited_by_link":"https:\/\/scholar.google.com\/scholar?cites=13368754249756812814&as_sdt=2005&sciodt=0,5&hl=en","cited_by_count":6.0,"pdf_file":null},{"title":"[PDF][PDF] DONATION OF MONSIGNOR GABRIEL FOUCHER FROM THE ORNITHOLOGICAL COLLECTION OF THE NATIONAL MUSEUM OF NATURAL HISTORY\"\u00a0\u2026","title_link":"https:\/\/www.researchgate.net\/profile\/Ana-Maria-Petrescu-2\/publication\/322525982_DONATION_OF_MONSIGNOR_GABRIEL_FOUCHER_FROM_THE_ORNITHOLOGICAL_COLLECTION_OF_THE_NATIONAL_MUSEUM_OF_NATURAL_HISTORY_GRIGORE_ANTIPA_IN_BUCHAREST_ROMANIA\/links\/5a5e0710a6fdcc68fa9909ab\/DONATION-OF-MONSIGNOR-GABRIEL-FOUCHER-FROM-THE-ORNITHOLOGICAL-COLLECTION-OF-THE-NATIONAL-MUSEUM-OF-NATURAL-HISTORY-GRIGORE-ANTIPA-IN-BUCHAREST-ROMANIA.pdf","publication_info":"A PETRESCU, MS RIDICHE, AM PETRESCU - researchgate.net","snippet":"\u2026 and Public Instructions announced the Zoology Museum that a \u2026 of the Catholic Institute \nMuseum from Paris. \u2039Parmi \u2026 deputy director of the Natural History Museum from Paris. For many \u2026","cited_by_link":null,"cited_by_count":null,"pdf_file":"https:\/\/www.researchgate.net\/profile\/Ana-Maria-Petrescu-2\/publication\/322525982_DONATION_OF_MONSIGNOR_GABRIEL_FOUCHER_FROM_THE_ORNITHOLOGICAL_COLLECTION_OF_THE_NATIONAL_MUSEUM_OF_NATURAL_HISTORY_GRIGORE_ANTIPA_IN_BUCHAREST_ROMANIA\/links\/5a5e0710a6fdcc68fa9909ab\/DONATION-OF-MONSIGNOR-GABRIEL-FOUCHER-FROM-THE-ORNITHOLOGICAL-COLLECTION-OF-THE-NATIONAL-MUSEUM-OF-NATURAL-HISTORY-GRIGORE-ANTIPA-IN-BUCHAREST-ROMANIA.pdf"},{"title":"Comprehensive phylogeny of the laughingthrushes and allies (Aves, Leiothrichidae) and a proposal for a revised taxonomy","title_link":"https:\/\/onlinelibrary.wiley.com\/doi\/abs\/10.1111\/zsc.12296","publication_info":"A Cibois, M Gelang, P Alstr\u00f6m, E Pasquet\u2026\u00a0- Zoologica\u00a0\u2026, 2018 - Wiley Online Library","snippet":"DNA phylogenies have gradually shed light on the phylogenetic relationships of the large \nbabbler group. We focus in this study on the family Leiothrichidae (laughingthrushes and \u201c\u2026","cited_by_link":"https:\/\/scholar.google.com\/scholar?cites=13244702554796976053&as_sdt=2005&sciodt=0,5&hl=en","cited_by_count":19.0,"pdf_file":"http:\/\/macroecointern.dk\/pdf-reprints\/Cibois_ZS_2018.pdf"},{"title":"Case 3754\u2013Circus assimilis Jardine & Selby, 1828 and Circus approximans Peale, 1848 (Aves, Accipitriformes): conservation of usage by designation of a neotype for\u00a0\u2026","title_link":"https:\/\/bioone.org\/journals\/The-Bulletin-of-Zoological-Nomenclature\/volume-75\/issue-1\/bzn.v75.a044\/Case-3754--Circus-assimilis-Jardine--Selby-1828-and\/10.21805\/bzn.v75.a044.short","publication_info":"SJS Debus, IAW McAllan, R Schodde\u00a0- The Bulletin of Zoological\u00a0\u2026, 2018 - BioOne","snippet":"\u2026 , which maintains very large zoological research collections, \u2026 a neotype in the Natural History \nMuseum, Tring, England, \u2026 leading Australian governmentsupported zoological collection, of \u2026","cited_by_link":"https:\/\/scholar.google.com\/scholar?cites=10790008245466086412&as_sdt=2005&sciodt=0,5&hl=en","cited_by_count":1.0,"pdf_file":"https:\/\/www.biotaxa.org\/bzn\/article\/view\/44158\/38032"},{"title":"Opinion 2383 (Case 3640) \u2014 Touit GR Gray, 1855 and Prosopeia Bonaparte, 1854 (Aves, psittacidae): usage of names conserved","title_link":"https:\/\/bioone.org\/journals\/the-bulletin-of-zoological-nomenclature\/volume-73\/issue-2-4\/bzn.v73i2.a22\/Opinion-2383-Case-3640--Touit-GR-Gray-1855-and\/10.21805\/bzn.v73i2.a22.short","publication_info":"International Commission on Zoological\u00a0\u2026\u00a0- \u2026\u00a0Bulletin of Zoological\u00a0\u2026, 2017 - BioOne","snippet":"\u2026 Under the plenary power the International Commission on Zoological Nomenclature has \nruled to maintain current usage of the names Touit GR Gray, 1855 and Prosopeia Bonaparte, \u2026","cited_by_link":null,"cited_by_count":null,"pdf_file":"https:\/\/www.biotaxa.org\/bzn\/article\/download\/37757\/32425"},{"title":"[HTML][HTML] The Palaearctic types of Chrysididae (Insecta, Hymenoptera) deposited in the Hungarian Natural History Museum, Budapest","title_link":"https:\/\/www.mapress.com\/zt\/article\/view\/zootaxa.4252.1.1","publication_info":"P Rosa, Z Vas, ZF Xu\u00a0- Zootaxa, 2017 - mapress.com","snippet":"A critical and annotated catalogue of the Palaearctic types of chrysidid species, subspecies \nand varieties deposited in the Magyar Term\u00e9szettudom\u00e1nyi M\u00fazeum is given. The lectotype \u2026","cited_by_link":"https:\/\/scholar.google.com\/scholar?cites=15881474893913524708&as_sdt=2005&sciodt=0,5&hl=en","cited_by_count":19.0,"pdf_file":"https:\/\/www.mapress.com\/zt\/article\/view\/zootaxa.4252.1.1"},{"title":"[HTML][HTML] Morphological evidence for the taxonomic status of the Bridge's Guan, Penelope bridgesi, with comments on the validity of P. obscura bronzina (Aves\u00a0\u2026","title_link":"https:\/\/www.scielo.br\/j\/zool\/a\/XPZfgj3bjLX4Nx9BqCQHdxF\/?format=html&lang=en","publication_info":"OD Evangelista-Vargas, LF Silveira\u00a0- Zoologia (Curitiba), 2018 - SciELO Brasil","snippet":"\u2026 obscura bronzina<\/i><\/i> (Aves: Cracidae) Morphological evidence for the taxonomic \nstatus of the Bridge\u2019s Guan, <i>Penelope bridgesi<\/i>, with comments on the validity of <i><i>P. \u2026","cited_by_link":"https:\/\/scholar.google.com\/scholar?cites=2753278180435218647&as_sdt=2005&sciodt=0,5&hl=en","cited_by_count":4.0,"pdf_file":"https:\/\/www.scielo.br\/j\/zool\/a\/XPZfgj3bjLX4Nx9BqCQHdxF\/?format=html&lang=en"},{"title":"[BOOK][B] The future of natural history museums","title_link":"https:\/\/books.google.com\/books?hl=en&lr=&id=MqU5DwAAQBAJ&oi=fnd&pg=PT23&dq=zoology+aves+%22Natural+History+Museum%22+(%22New+York%22+OR+%22Paris%22)&ots=NmomaRUgvY&sig=SbVLQPibGNu_6jnO_paKiJ1D1aA","publication_info":"E Dorfman - 2017 - books.google.com","snippet":"\u2026 of the trajectory of the natural history museum sector in the next 20 \u2026 a newspaper reporter at \nNewsday in New York. She earned her \u2026 She has experience in Zoology with an emphasis in \u2026","cited_by_link":"https:\/\/scholar.google.com\/scholar?cites=12024103019083755514&as_sdt=2005&sciodt=0,5&hl=en","cited_by_count":12.0,"pdf_file":null},{"title":"[BOOK][B] Museums in motion: An introduction to the history and functions of museums","title_link":"https:\/\/books.google.com\/books?hl=en&lr=&id=iw4TDgAAQBAJ&oi=fnd&pg=PR5&dq=zoology+aves+%22Natural+History+Museum%22+(%22New+York%22+OR+%22Paris%22)&ots=DEZplQY2ph&sig=uFPXKO3seuTFkxQ-raYct44y2_Q","publication_info":"EP Alexander, M Alexander, J Decker - 2017 - books.google.com","snippet":"\u2026 hides, and a botanical and zoological park, but it was chiefly a \u2026 studies, and botanical and \nzoological gardens.16 Bearing in \u2026 and had branches in Baltimore and New York. He mounted \u2026","cited_by_link":"https:\/\/scholar.google.com\/scholar?cites=14537271977882170055&as_sdt=2005&sciodt=0,5&hl=en","cited_by_count":1420.0,"pdf_file":null}]


Let me know if it helped or not 👍

abubelinha commented 1 year ago

Indeed it helps. Thanks!

dimitryzub commented 1 year ago

@abubelinha thank you for clarifying 🙂 I'll add this example to the README (or wiki) also.

Closing this issue as resolved.