VT-CHCI / google-scholar

nodejs module for searching google scholar
MIT License
20 stars 11 forks source link

Retrieve Full Google scholar Result #5

Closed chaosmaker closed 7 years ago

chaosmaker commented 7 years ago

Hi, I am fairly new to nodejs and I can't seem to figure out how I can cycle through all of the results that are being parsed by the search function.

What I tend to do is something like this: scholar.search('author:"someone"').then(res => { dosomethingwithres(res); });

I understand that doing res.next().then(res => {}) gives me the next iteration, but I can't figure out how I can manage to continue to cycle through the results and get every paper published by a given author.

Any help would be really appreciated.

Thanks

hcientist commented 7 years ago

I will try to update the docs to answer this in the future, but please see here: http://jsbin.com/ceduhugefa/edit?js for a quick response I emailed to someone who asked a similar question. the main idea is to have a function that you call with the response from google scholar and in this function you can check whether there is a next and if so you call the next with this same handling function as the callback.

IMPORTANT: I have not yet had the time to implement rate-limiting (if you too quickly ask for too many things from google scholar, you will be temporarily blocked.

chaosmaker commented 7 years ago

Thanks for getting back so quickly. I was hoping it could be solved a little differently as all the rest of the code is fairly asynchronous. This way I will need a global variable before I have all of the results. Hmm... maybe I'll just process each block separately first. The code did help a lot! Any timeframe for including the limiter?

hcientist commented 7 years ago

I am very interested in contributions to this open source project (-; this is not on my radar right now. I don't expect to have time for this before at least september /-: it really shouldn't be much harder than testing an existing rate limiter and adding it to the mix. (i have commented in the issue to remind myself the next libraries to try).

I suppose if we can assume 10 results per page, and the first response tells you there are N results total (e.g. N=111 for query chairmouse), then you could followup with N/10 - 1 parallel requests (the nextUrl is a [predictably] structured url) I will mark this as a feature request. for my original purposes, we didn't need all the results at once, but it looks like it'd be useful to you and others.