eclipse-sw360 / sw360

SW360 project
https://www.eclipse.org/sw360/
Other
117 stars 98 forks source link

The added `~` for fuzzy lucene search generates invalid queries #427

Open maxhbr opened 5 years ago

maxhbr commented 5 years ago

The line https://github.com/eclipse/sw360/blob/e52f0e1635580e24032d8f7963e3ffddfcb9bb4e/libraries/lib-datahandler/src/main/java/org/eclipse/sw360/datahandler/couchdb/lucene/LuceneAwareDatabaseConnector.java#L214 in prepareFuzzyQuery causes some queries to be invalid.

E.g. if some one initially uploads the users list, the request

/_fti/local/sw360users/_design/lucene/users?include_docs=false&limit=150&q=%7E

is issued ( %7E is an url-encoded ~) and as a result the following error log is printed:

[..]
sw360_1                |  2019-01-07 08:02:40,378 INFO  UserUtils:79 - Creating new user.
sw360_1                |  2019-01-07 08:02:40,657 INFO  UserUtils:79 - Creating new user.
sw360_1                |  2019-01-07 08:02:40,939 INFO  UserUtils:79 - Creating new user.
sw360_1                |  2019-01-07 08:02:41,198 INFO  UserUtils:79 - Creating new user.
sw360_1                |  _design/lucene users
sw360_1                | 2019-01-07 08:02:41 ERROR LuceneAwareDatabaseConnector:124 - Error querying database.
sw360_1                |  org.ektorp.DbAccessException: 400:Bad Request
sw360_1                | URI: /_fti/local/sw360users/_design/lucene/users?include_docs=false&limit=150&q=%7E
sw360_1                | Response Body: 
sw360_1                | {
sw360_1                |   "reason" : "Bad query syntax: Cannot parse '~': Encountered \"  \"~ \"\" at line 1, column 0.\nWas expecting one of:\n     ...\n    \"+\" ...\n    \"-\" ...\n     ...\n    \"(\" ...\n    \"*\" ...\n     ...\n     ...\n     ...\n     ...\n     ...\n    \"[\" ...\n    \"{\" ...\n     ...\n     ...\n    ",
sw360_1                |   "code" : 400
sw360_1                | }
sw360_1                |    at org.ektorp.http.StdResponseHandler.createDbAccessException(StdResponseHandler.java:50)
sw360_1                |    at org.ektorp.http.StdResponseHandler.error(StdResponseHandler.java:91)
sw360_1                |    at org.ektorp.http.RestTemplate.handleResponse(RestTemplate.java:126)
sw360couchdb-lucene_1  | 2019-01-07 08:02:56,289 INFO [sw360users] View[name=_design/lucene/all, digest=12rxdq8trjqjgz2zq26c20n81] now at update_seq 20
sw360couchdb-lucene_1  | 2019-01-07 08:02:56,339 INFO [sw360users] View[name=_design/lucene/users, digest=dtohk52041g77gl5vfn65grtg] now at update_seq 20
[...]

This might cause unexpected behaviour in the user upload.

related commits:

maxhbr commented 5 years ago

ping @maierthomas: you have last touched the code, can you explain what the ~ is doing and why it is added via prepareFuzzyQuery to the query which seemingly tries to find all users.

maierthomas commented 5 years ago

Fuzzy query is used to find similar results based on the search text, like fuzzy~ will return wuzzy or luzzy. You can find the complete explanation about fuzzy searches here

Yes I refactored searchByNameAndEmail and added the method prepareFuzzyQuery. But the behavior of the implementation is the same.

First of all I can face two problems.

  1. I can also not understand why fuzzy search is used for searchByNameAndEmail. Some mail address contain special chars like '.' '-' and I think these address are spitted in different terms.
  2. searchByNameAndEmail is executed without any search text. So the system creates a query only with the fuzzy character ""~. Of course this cannot be executed and an error occurs.

UserPortlet.java (doView)

backEndUsers = CommonUtils.nullToEmptyList(client.searchUsers(null));