anvc / scalar

Born-digital, open source, media-rich scholarly publishing that’s as easy as blogging.
Other
231 stars 73 forks source link

Content search queries with apostrophe/single quote return no results #232

Open alexdryden opened 7 months ago

alexdryden commented 7 months ago

Issue:

Testing a query for "Thirty Years' War" in an upcoming publication, we discovered that queries with ' (single quote/apostrophe) return no results when the term appears in content.

To reproduce:

Create a page with a string that contains a word with an apostrophe and try to search for that word, or search for "Brooks Institute’s Commercial Photography" from this USC publication https://scalar.usc.edu/works/wayne-thom. It should return this page, https://scalar.usc.edu/works/wayne-thom/introduction?path=index, but instead returns no results.

Here is a test page and test media where I've put an apostrophe in the page title and metadata (Metadata: dcterms:source: "testing terms with apostrophe Whistler's Mother"). Searching for "Scalar's Page Layouts" or "Whistler's Mother" returns the targeted results with either the title only or the title and content selection and searching for "Whistler's Mother" returns the correct result when selecting dcterms:source. So the issue seems to be limited to content based searches.

If you can point me in the right direction, I have some free cycles this week that I can use to troubleshoot.

craigdietrich commented 7 months ago

Thanks for pointing this out! Clearly we're not escaping the string correctly before going to the search method. I'll take a look.

craigdietrich commented 7 months ago

I think this is a versioning problem, actually. The USC /works install is behind the most recent updates to Scalar on GitHub, including a reworking of the search box. Searching now uses a different process which, after testing on my end, doesn't seem impacted by the quote problem. (Although it does have problems with case insensitivity that I'm trying to work through currently.)

Let me see how long it will be before we can update the /works install.

alexdryden commented 7 months ago

Ah, interesting! We are on version 2.6.7, but I think that doesn't include some of the new features you pushed recently for the search box. I'll also double-check against the most recent commits.

eloyer commented 6 months ago

The /works install has now been updated to 2.6.8.

alexdryden commented 6 months ago

Thanks for letting me know! I'm still getting the same results, e.g. searching for "Brooks Institute’s Commercial Photography" produces no results on https://scalar.usc.edu/works/wayne-thom/introduction, but "Brooks Institute" gets the expected pages. I'm wondering if the discrepancy might have to something to do with a database version or implementation?

craigdietrich commented 6 months ago

Hi Alex,

Hmm.  No, I don't think it's the database version or anything like that.  The new search uses a SPARQL query (instead of a regular MySQL query) so it's kind of particular.  I'll see if I can duplicate the error on my end then, if so, work on that SPARQL query.