Closed PipzCorreiaz closed 2 years ago
Just to clarify, the requested changes apply to the provider, not the collector.
The returned list of papers from get_paper_by_terms
does not have duplicates.
Also, the assumption that the response from /paper/search?query=...
from the Semantic Scholar's API does not return repeated papers is valid.
I tested the API and never came across a situation it returned duplicates of the same paper. @PipzCorreiaz @RicardoEPRodrigues are you sure this actually happens?
Indeed the Semantic Scholar's API return duplicate results on /paper/search?query=
.
During the process I also realized that there is another constraint to the search process that need to be taken into account (reported on #16). Should be handled in a different PR.
We were assuming the results from the
collect_by_terms
were unique, which is not true if the collector finds repeated papers.Instead of considering the length of the response data, we are now using the length of unique papers actually added to the resulting papers. This guarantees the number of results will always match the
max_papers
parameter.