Closed emillykkejensen closed 5 years ago
Thanks. Incorporated it into the package. If you can test, I'll upload the changes to cran.
Just tried it out - and it works fine :)
Ok. Great. I'll upload to CRAN tomorrow.
I've update the code, if less than 200 sentences, combn is used, more than 200 sentences data.table as that seems to be the tipoffpoint where one approach is faster than the other. Pushed to cran now.
Thanks for a great package
When running textrank_sentences() on very large datasets, the textrank_candidates_all() (in particularly the utils::combn() function within) can’t really cope and throws an error. Therefor I have built a simpler textrank_candidates_all() which I believe can do the same job – but faster and more memory efficient.
To compare with the old, try running:
Here I get an error using textrank_candidates_all() but not using textrank_candidates_all2()
If you lower the number of id's and run it again, you will get big performance difference between the two functions:
which gives me:
For textrank_candidates_all()
…and textrank_candidates_all2()
Finally the two functions seems to output the same values:
identical(textrank_candidates, textrank_candidates2) is TRUE
So, you could consider implementing this function if you please.