ckreibich / scholar.py

A parser for Google Scholar, written in Python
2.12k stars 776 forks source link

Allow paging #43

Open norro opened 9 years ago

norro commented 9 years ago

Allow paging to receive results >20. Can be done with Google Scholar's search parameter 'start'.

andreas-wilm commented 9 years ago

I thought after this patch iteration should be as easy as continuously calling querier.send_query(query) and updating query.start if len(querier.articles) == query.num_results and break otherwise. But that somehow lists results repeatedly up to a certain seemingly number...

Is there still a plan to implement iteration?

willwhitney commented 8 years ago

:+1:

vext01 commented 8 years ago

I'd also like this feature

floweri commented 8 years ago

how should start parameter set? I am not familiar with python. I would be appreciated If you could help.

eknoes commented 8 years ago

In the PR #44 it is not possible via the command line, just in your own code. After initializing your Query you have to use the method set_start.

query = SearchScholarQuery()
query.set_start(20)
floweri commented 8 years ago

thanks. this is apart of my code:

if options.cluster_id: query = ClusterScholarQuery(cluster=options.cluster_id) else: query = SearchScholarQuery() query.set_start(20) if options.author: query.set_author(options.author) if options.allw: query.set_words(options.allw) if options.some:

but it has this error:

searchscholarquery does'nt have any attribute 'set_start'

eknoes commented 8 years ago

Yes, that is because this Pull Request is not merged yet and so it is not included in the main branch. You have to get the code from #44

norro commented 8 years ago

Concerning the usage, @eknoes is right, SearchScholarQuery got the additional set_start() method. I usually first check, if paging is necessary:

if len(querier.articles) >= ScholarConf.MAX_PAGE_RESULTS:
  do_paging = True

and then, if paging is due:

query = SearchScholarQuery()
...
query.set_start(paging * ScholarConf.MAX_PAGE_RESULTS)
floweri commented 8 years ago

I change to code according to commit and then add this number: query.set_start(100) MAX_PAGE_RESULTS = 100

but it find only 20 articles again.

eknoes commented 8 years ago

It will always find a maximum of 20 articles per page. You got article 100 to 120!

floweri commented 8 years ago

No, i find only 20 articles at all.