allenai / s2-folks

Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.
Other
144 stars 25 forks source link

Bug: hyphenated phrases cause weird search behaviors #203

Closed aryehgigi closed 6 days ago

aryehgigi commented 2 weeks ago

Describe the Bug According to the docs of the paper/search public API: Hyphenated query terms yield no matches (replace it with space to find matches) see here. But i don't see any difference in matching results when i do replace the hyphen with a space..

To Reproduce

>>> import requests
>>> print(requests.get("https://api.semanticscholar.org/graph/v1/paper/search/", params = {"query":'Evidence-based Syntactic Transformations for IE', "fields": "title,corpusId","offset": "0", "fieldsOfStudy": "Computer Science"}, headers={...}).json()["total"])
92
>>> print(requests.get("https://api.semanticscholar.org/graph/v1/paper/search/", params = {"query":'Evidence based Syntactic Transformations for IE', "fields": "title,corpusId","offset": "0", "fieldsOfStudy": "Computer Science"}, headers={...}).json()["total"])
92
>>> print(requests.get("https://api.semanticscholar.org/graph/v1/paper/search/", params = {"query":'Evidence Syntactic Transformations for IE', "fields": "title,corpusId","offset": "0", "fieldsOfStudy": "Computer Science"}, headers={...}).json()["total"])
1

Expected Behavior when i replace the hyphen with a space i expect to get more accurate and thus less results

Actual Behavior when i replace the hyphen with a space i get the same amount of overflowing results as if the hyphen was there.

Environment Details Platform: Linux

cfiorelli commented 6 days ago

In this case the feature for query operators on this endpoint does not exist. The documentation is going to be updated by end of today.

Thank you!

aryehgigi commented 21 hours ago

@cfiorelli sorry i missed this issue-closure

iiuc you only updated the docs - but didnt change behavior. so im not sure you understood my issue, as this is still a bug (unless you intend to mark it as a known issue?).. when i am looking for a paper that has - in its title (e.g. Evidence-based Syntactic Transformations for IE) i expect either the Evidence-based Syntactic Transformations for IE query to find it or the Evidence based Syntactic Transformations for IE query to find it. instead both find 92 results! wdyt? thanks!