allenai / s2-folks

Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.
Other
190 stars 29 forks source link

Bug: Paper Search Bulk matches all words regardless of logic #162

Closed rwst closed 1 year ago

rwst commented 1 year ago

Describe the Bug Searching for papers using the expression "(software | application | systems ) + (fault | defect | quality | error-prone) + (predict | prediction | prone | probability | assess | assession | detect | detection | estimate | estimation | classification | classify)" matches the paper with title "pERK-dependent defective TCR-mediated activation of CD4+ T cells in end-stage renal disease patients" and many others not fitting the pattern.

To Reproduce Start a bulk paper search with https://api.semanticscholar.org/graph/v1/paper/search/bulk?query=%28software%20%7C%20application%20%7C%20systems%20%29%20%2B%20%28fault%20%7C%20defect%20%7C%20quality%20%7C%20error-prone%29%20%2B%20%28predict%20%7C%20prediction%20%7C%20prone%20%7C%20probability%20%7C%20assess%20%7C%20assession%20%7C%20detect%20%7C%20detection%20%7C%20estimate%20%7C%20estimation%20%7C%20classification%20%7C%20classify%29&fields=paperId%2CexternalIds%2Ctitle%2Cabstract%2CpublicationTypes%2CpublicationDate

Expected Behavior The logical expression should match only if keywords from all three groups (in parentheses) are present.

Actual Behavior The API returns papers where any of the keywords is present, disregarding the grouping and the +/AND logic.

Environment Details Platform (e.g., Windows, MacOS, Linux): Linux Browser (if relevant): Any other relevant software or libraries you're using: okhttp client v3 5.0.0-alpha.11

cfiorelli commented 1 year ago

@rwst We have found that the example you shared is being returned due to stemming i.e. "predictive" or "predictor" will match the keyword "predict". A second issue which I dont think is contributing here is that a user of this feature could find papers due to text matching on an abstract, where if the abstract belongs to springer we wont be able to show it.

In closing there is an opportunity to make a callout about 1) springer abstract matching 2) revisit whether we can expand/reword our description in documentation regarding stemming.