SarthakJShetty / pyResearchInsights

End-to-end NLP tool to analyze research publications. Published in Ecology & Evolution 2021.
MIT License
30 stars 8 forks source link

Abstract continuation #4

Closed RichardScottOZ closed 2 years ago

RichardScottOZ commented 2 years ago

I didn't think I saw this - let's say you had 2K abstracts to get and your internet connection dies after 1200.

How much work to change from 'get to where it left off'? Or add.

Rather than start doing it again.

SarthakJShetty commented 2 years ago

Right, we're working on this at the moment. Should be a fairly simple modification to the code. IIRC we had this functionality earlier, but decided to get rid of it since new papers kept messing up the index that we were using to keep track of the number of papers.

If someone has a fix for this, I will gladly review and accept PR. If not, I'll work on implementing this fix soon, since a lot of users have asked for this.

RichardScottOZ commented 2 years ago

Yeah, I guess that is tricky, but for any one 'run' can it have a 'static' index list as it was at that minute, and the 'continue' just works on the person's list at the time?

SarthakJShetty commented 2 years ago

Hmm that does sound interesting. I was thinking of maybe a pre-processing step that checks the index if the last retrieved index and skip those many papers/abstracts.