allenai / s2-folks

Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.
Other
188 stars 29 forks source link

Bug: hyphenated phrases cause weird search behaviors #203

Open aryehgigi opened 5 months ago

aryehgigi commented 5 months ago

Describe the Bug According to the docs of the paper/search public API: Hyphenated query terms yield no matches (replace it with space to find matches) see here. But i don't see any difference in matching results when i do replace the hyphen with a space..

To Reproduce

>>> import requests
>>> print(requests.get("https://api.semanticscholar.org/graph/v1/paper/search/", params = {"query":'Evidence-based Syntactic Transformations for IE', "fields": "title,corpusId","offset": "0", "fieldsOfStudy": "Computer Science"}, headers={...}).json()["total"])
92
>>> print(requests.get("https://api.semanticscholar.org/graph/v1/paper/search/", params = {"query":'Evidence based Syntactic Transformations for IE', "fields": "title,corpusId","offset": "0", "fieldsOfStudy": "Computer Science"}, headers={...}).json()["total"])
92
>>> print(requests.get("https://api.semanticscholar.org/graph/v1/paper/search/", params = {"query":'Evidence Syntactic Transformations for IE', "fields": "title,corpusId","offset": "0", "fieldsOfStudy": "Computer Science"}, headers={...}).json()["total"])
1

Expected Behavior when i replace the hyphen with a space i expect to get more accurate and thus less results

Actual Behavior when i replace the hyphen with a space i get the same amount of overflowing results as if the hyphen was there.

Environment Details Platform: Linux

cfiorelli commented 5 months ago

In this case the feature for query operators on this endpoint does not exist. The documentation is going to be updated by end of today.

Thank you!

aryehgigi commented 4 months ago

@cfiorelli sorry i missed this issue-closure

iiuc you only updated the docs - but didnt change behavior. so im not sure you understood my issue, as this is still a bug (unless you intend to mark it as a known issue?).. when i am looking for a paper that has - in its title (e.g. Evidence-based Syntactic Transformations for IE) i expect either the Evidence-based Syntactic Transformations for IE query to find it or the Evidence based Syntactic Transformations for IE query to find it. instead both find 92 results! wdyt? thanks!

cfiorelli commented 4 months ago

investigating over DM - broken out to 2 distinct issues

  1. Documentation update for "hyphenated query terms" The docs indicated a functionality which does not exist for this endpoint: Using a hyphen to exclude a keyword. After reviewing your report we found that the behavior is working as intended but the documentation was misleading or inaccurate. As of today it seems the docs have reverted and are again showing the misleading instruction about using hyphenated query terms. I'll take a look at whats going here later today.

  2. Searching for a paper title fails to return the paper, but returns 92 other papers Holding for follow up with @aryehgigi to make sure i've got it clear before moving on this one.

aryehgigi commented 4 months ago

yes point 2 is the main one i was actually aiming to

imagine a user a looking for a paper that is titled: "AI2: the Seattle-based company..". now as a user they might try to search for seattle-based which would not lead to finding the paper for some unknown reason. see a real example in my initial comment of this issue

thanks a lot for reopening this!