clojars / clojars-web

A community repository for open-source Clojure libraries
https://clojars.org
Eclipse Public License 1.0
470 stars 114 forks source link

Simple search with / fails #856

Closed tobias closed 1 year ago

tobias commented 1 year ago

Searching for a string with a / in it (like tools/cli) results in

{"error":"Invalid search syntax for query `tools/cli`","error-id":"e5900047-78ca-4b89-841f-8dfe0f541646"}

This is because we have a catch-all exception handler around search, and return this generic error on failure. The root cause is:

Caused by org.apache.lucene.queryparser.flexible.standard.parser.TokenMgrError
   Lexical error at line 1, column 20. Encountered: <EOF> after : "/cli"

StandardSyntaxParserTokenManager.java:  914  org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParserTokenManager/getNextToken
 StandardSyntaxParser.java:  920  org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser/jj_scan_token
 StandardSyntaxParser.java:  791  org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser/jj_3R_4
 StandardSyntaxParser.java:  816  org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser/jj_3_2
 StandardSyntaxParser.java:  705  org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser/jj_2_2
 StandardSyntaxParser.java:  294  org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser/Clause
 StandardSyntaxParser.java:  279  org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser/ModClause
 StandardSyntaxParser.java:  210  org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser/ConjQuery
 StandardSyntaxParser.java:  180  org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser/DisjQuery
 StandardSyntaxParser.java:  133  org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser/Query
 StandardSyntaxParser.java:  114  org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser/TopLevelQuery
 StandardSyntaxParser.java:   62  org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser/parse
    QueryParserHelper.java:  245  org.apache.lucene.queryparser.flexible.core.QueryParserHelper/parse
  StandardQueryParser.java:  158  org.apache.lucene.queryparser.flexible.standard.StandardQueryParser/parse
                search.clj:  299  clojars.search/-search*
                search.clj:  294  clojars.search/-search*
                search.clj:  316  clojars.search/-search
                search.clj:  307  clojars.search/-search
                search.clj:  350  clojars.search.LuceneSearch/search
           search_test.clj:   69  clojars.unit.search-test/fn/fn
           search_test.clj:   69  clojars.unit.search-test/fn
           search_test.clj:   63  clojars.unit.search-test/fn
                  test.clj:  242  cider.nrepl.middleware.test/test-var/fn
                  test.clj:  242  cider.nrepl.middleware.test/test-var
                  test.clj:  234  cider.nrepl.middleware.test/test-var
                  test.clj:  257  cider.nrepl.middleware.test/test-vars/fn/fn
                  test.clj:  687  clojure.test/default-fixture
                  test.clj:  683  clojure.test/default-fixture
                  test.clj:  257  cider.nrepl.middleware.test/test-vars/fn
           test_helper.clj:   64  clojars.test-helper/default-fixture
           test_helper.clj:   58  clojars.test-helper/default-fixture
                  test.clj:  694  clojure.test/compose-fixtures/fn/fn
                  test.clj:  687  clojure.test/default-fixture
                  test.clj:  683  clojure.test/default-fixture
                  test.clj:  694  clojure.test/compose-fixtures/fn
                  test.clj:  254  cider.nrepl.middleware.test/test-vars
                  test.clj:  248  cider.nrepl.middleware.test/test-vars
                  test.clj:  270  cider.nrepl.middleware.test/test-ns
                  test.clj:  261  cider.nrepl.middleware.test/test-ns
                  test.clj:  281  cider.nrepl.middleware.test/test-var-query
                  test.clj:  274  cider.nrepl.middleware.test/test-var-query
                  test.clj:  319  cider.nrepl.middleware.test/handle-test-var-query-op/fn/fn
                  AFn.java:  152  clojure.lang.AFn/applyToHelper
                  AFn.java:  144  clojure.lang.AFn/applyTo
                  core.clj:  667  clojure.core/apply
                  core.clj: 1990  clojure.core/with-bindings*
                  core.clj: 1990  clojure.core/with-bindings*
               RestFn.java:  425  clojure.lang.RestFn/invoke
                  test.clj:  311  cider.nrepl.middleware.test/handle-test-var-query-op/fn
                  AFn.java:   22  clojure.lang.AFn/run
               session.clj:  218  nrepl.middleware.session/session-exec/main-loop/fn
               session.clj:  217  nrepl.middleware.session/session-exec/main-loop
                  AFn.java:   22  clojure.lang.AFn/run
               Thread.java:  833  java.lang.Thread/run

Wrapping the query in qoutes works.

We could do a couple of things:

borkdude commented 1 year ago

Note that neil currently fixes this by wrapping the query in double quotes:

https://github.com/babashka/neil/blob/4d8a2884ab147ab5fa7d142e419f673c5ca7bdbc/src/babashka/neil.clj#L479

tobias commented 1 year ago

Related slack discussion

tobias commented 1 year ago

I "fixed" this by converting foo/bar tokens to a search for group-id:foo AND artifact-id:bar, but based on discussion in https://github.com/babashka/neil/pull/158, @borkdude and I realized this may not have been the best approach, since you generally don't need to search if you already know the exact group & artifact. So I think we should take a different approach to fixing this.

A simple change would be to escape / in tokens. That would at least satisfy the lucene query parser, and foo\\/bar will then match foo/bar in the search doc content.

borkdude commented 1 year ago

@tobias I think the change was good but maybe the result set should be joined with what was returned before.

I was also surprised that partial matches like babashka/cli (rather than org.babashka/cli) didn't return anything for the group / artifact query.

tobias commented 1 year ago

I think the change was good but maybe the result set should be joined with what was returned before.

I think joining makes sense. We can do ((group-id:foo AND artifact-id:bar) OR "foo/bar"), which would then include the prior results as well.

I was also surprised that partial matches like babashka/cli (rather than org.babashka/cli) didn't return anything for the group / artifact query.

This is because group-id and artifact-id are exact matches, not fuzzy. I have some ideas on how to improve the results in cases like this, but I don't know if they will work. I'll play with that a bit and report back.

tobias commented 1 year ago

I've made the change to join the results, and made some changes to what we index, so you should get better results now. For example, https://clojars.org/search?q=babashka%2Fcli finds org.babashka/cli now, and https://clojars.org/search?q=org.clojure%2Ftools.cli finds org.clojars.rmremizov/clj-argparse.

Let me know how it works for you @borkdude!

borkdude commented 1 year ago

I can confirm it works a lot better now! Thanks!

tobias commented 1 year ago

My pleasure!

On Wed, Feb 1, 2023, at 08:49, Michiel Borkent wrote:

I can confirm it works a lot better now! Thanks!

— Reply to this email directly, view it on GitHub https://github.com/clojars/clojars-web/issues/856#issuecomment-1412089632, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAURYYXLTKXHHFWU3Y553WVJSYBANCNFSM6AAAAAAULCOCCY. You are receiving this because you modified the open/close state.Message ID: @.***>