dasch-swiss / dsp-api

DaSCH Service Platform API
http://admin.dasch.swiss
Apache License 2.0
74 stars 18 forks source link

Fulltext search with wildcard(*) does not return all results (Salsah 1.3) #902

Open musicEnfanthen opened 6 years ago

musicEnfanthen commented 6 years ago

Obviously not all property entries are considered by fulltext search in Salsah (v1.3) when used with wildcard (*).

Maybe it is somehow connected to #518 , but open the property field and saving it again, like in #518 has no effect at all.

Do you have any idea where this issue is coming from? Fulltext search should of course be reliable, it's fundamental for our daily work.

I add some screenshots to illustrate the bug:

Fulltext search for "Hertz*" (4 results with "Hertzka"): Fulltext search for "Hertz*"

Fulltext search for "Hertzka" (170 results): Fulltext search for "Hertzka"

Fulltext search for "Skizze" (12 results): Fulltext search for "Skizze"

Fulltext search for "Skizze*" (12 partially other! results): Fulltext search for "Skizze*"

Fulltext search for "Skizzen" (280 results): Fulltext search for "Skizzen"

"Skizze*" should have at least 292 results (those of "Skizze" + "Skizzen").

Could you give us some hint where to look for the reasons of this bug?

tobiasschweizer commented 6 years ago

Could you look at the SPARQL query that is actually produced?

https://github.com/dhlab-basel/Knora/blob/develop/webapi/src/main/twirl/queries/sparql/v1/searchFulltextGraphDB.scala.txt

comment this line in to get the SPARQL: https://github.com/dhlab-basel/Knora/blob/1c385dc5ed0839f5f646abe7e64ab627e2c539d8/webapi/src/main/scala/org/knora/webapi/responders/v1/SearchResponderV1.scala#L153

I may be related to the Lucene syntax.

tobiasschweizer commented 6 years ago

for v2 I tried to put the Lucene handling in a central place: https://github.com/dhlab-basel/Knora/blob/develop/webapi/src/main/scala/org/knora/webapi/util/search/ApacheLuceneSupport.scala

you also find some links there to the Lucene docs

musicEnfanthen commented 6 years ago

Thanks for your quick reply.

I am not sure that Knora is involved at all, because our issue is on the running live system (Salsah 1.3). I checked the query again, and maybe it is somehow connected to jquery.simpsearch.js which is calling SALSAH.ApiGet in 01_salsah-api.js?

Another idea: When you look at #518, there was a problem with really old richtext values that were not found by fulltext search, because at the very beginning there was only an id based pointer to a richtext object from any specific project’s richtext value, the actual text value was not incorporated. With @lrosenth we could solve this bug there for simple search queries without wildcard (didn't check wildcards back then).

Is it possible that the fulltext search with wildcard does not get the right response from the richtext objects again?

tobiasschweizer commented 6 years ago

If this is about the live system (Salsah prototype), this is surely the wrong place for this issue :-)

musicEnfanthen commented 6 years ago

Yes I know, but on GitLab it wouldn't have been recognized, I guess. So I didn't know where to put it otherwise.

tobiasschweizer commented 6 years ago

There is an issue tracking system in our internal gitlab, but probably your issues would just be ignored. I actually noticed that there is already an issue: number 6

@lrosenth Where should we put issues relating to the Salsah prototype? And who would resolve them?

musicEnfanthen commented 6 years ago

Yes, being ignored was what I wanted to prevent when opening this issue here ;)

By the way, I can't see any issues at all in Knora/salsah repository on GitLab, in the side menu there is only Overview, Repository, Wiki and Members. (In other repos, like e.g. Knora / Sipi the Repository tab is followed by Issues and Merge Requests)

musicEnfanthen commented 6 years ago

Thanks for the link. Now I see, Knora/salsah project on GitLab is only a container for a lot of subprojects, including the old salsah-app. The issue you mention (which is by the way the old one resolved in #518) is in salsah-suite/salsah-app which is an archived, read-only subproject (for 2 years now) inside Knora/salsah. Had to turn on Show archived projects to find it finally. (And I closed it now).

Do you want me to open a new issue there? As you can see, no one else but me ever used this repo to write down an issue what was more intended as a reminder and even I forgot about it. So it doesn't appear very useful to me...

But be it as it is, maybe you have an idea where this issue with the wildcard search is coming from or where I could search for it, @lrosenth ?