Closed xeisberg closed 3 years ago
Hi @xeisberg Thank you for kind words and glad to hear that the tools are interesting to you.
For the error above -- do you see any errors in Solr's logs?
Does the index exist and it contains the vector
field?
Can you please paste the screenshot of the streamlit UI, since it has few params -- so it will be easier to reproduce on my side.
Dear @DmitryKey . Thank you for taking the time to consider my issue. I should also mention that I am using solr 8.5.2 with this plugin version https://github.com/markhng525/solr-vector-scoring
I did try solr 6.6.0 and 8.8.0 following your instructions and ended up with the same error while using streamlit although indexing worked fine. I have not saved the error logs but can quickly do so as I have kept the folders intact and easy to run.
solr logging looks like this:
I think the indexing is working fine as can be seen in this screenshot with the vector:
However, if I run "defType=vp" for the vector function I get the following error:
{ "responseHeader":{ "status":500, "QTime":0, "params":{ "q":"*:*", "defType":"vp", "wt":"json", "debugQuery":"on", "_":"1620351183292"}}, "error":{ "trace":"java.lang.NullPointerException\n\tat com.github.saaay71.solr.query.VectorQParserPlugin$1.parse(VectorQParserPlugin.java:34)\n\tat org.apache.solr.search.QParser.getQuery(QParser.java:174)\n\tat org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:161)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:309)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)\n\tat org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:579)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:500)\n\tat org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)\n\tat org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:388)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)\n\tat java.lang.Thread.run(Thread.java:748)\n", "code":500}}
Here are some screenshots of the streamlit UI. It does not work with any of the parameter settings that I have tried.
Edit: I did a try on the 8.0.0 version as well. Seems to be a different kind of error
Solr:
`
Problem accessing /solr/vector_10/select. Reason:
Server Error
java.lang.NoClassDefFoundError: org/apache/lucene/queries/CustomScoreQuery at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:756) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:817) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at com.github.saaay71.solr.VectorQParserPlugin.createParser(VectorQParserPlugin.java:16) at org.apache.solr.search.QParser.getParser(QParser.java:367) at org.apache.solr.search.QParser.getParser(QParser.java:319) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:157) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:272) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2559) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:711) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:394) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:340) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:502) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: org.apache.lucene.queries.CustomScoreQuery at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:817) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 57 more
java.lang.ClassNotFoundException: org.apache.lucene.queries.CustomScoreQuery at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:817) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:756) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:817) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at com.github.saaay71.solr.VectorQParserPlugin.createParser(VectorQParserPlugin.java:16) at org.apache.solr.search.QParser.getParser(QParser.java:367) at org.apache.solr.search.QParser.getParser(QParser.java:319) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:157) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:272) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2559) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:711) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:394) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:340) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:502) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683) at java.lang.Thread.run(Thread.java:748)
`
Logging: Streamlit: `JSONDecodeError: Expecting value: line 1 column 1 (char 0) Traceback:
File "/home/n/documents/my_jina_app/baas/env/lib/python3.6/site-packages/streamlit/script_runner.py", line 337, in _run_script
exec(code, module.dict)
File "/home/n/documents/my_jina_app/baas/abert/src/search_demo_solr.py", line 122, in
hey @xeisberg it occurred to me: what version of the vector jar are you using? It might be you need to compile and configure the 8.x version: https://github.com/DmitryKey/solr-vector-scoring/releases/tag/8.0.0 Did you already try it?
Also https://github.com/DmitryKey/solr-vector-scoring has the instruction for sending the POST request to Solr to find similar vectors:
{!vp f=vector vector="0.1,4.75,0.3,1.2,0.7,4.0"}
So you can first try this approach using curl or Postman for instance -- and if that works, then turn to the streamlit app.
Also, are using the master branch?
Hello @DmitryKey . I tried using the jar file from the master branch later on the 8.5.2 version and your 8.0.0 by creating it with mvn package
.
Those queries seem to work fine; I tried with different values and it works.
Do you mean the master branch of the solr vector scoring?
The streamlit error remains:
` JSONDecodeError: Expecting value: line 1 column 1 (char 0) Traceback:
File "/home/n/documents/my_jina_app/baas/env/lib/python3.6/site-packages/streamlit/script_runner.py", line 337, in _run_script
exec(code, module.dict)
File "/home/n/documents/my_jina_app/baas/cbert/src/search_demo_solr.py", line 122, in
By the way, the vectors look a bit different in the solr admin page:
I beg your pardon for all the troubles and thank you for your assistance.
Edit: I also tried to query [language] with vectors created from BaS and it worked well
http://localhost:8983/solr/vector_10/query?fl=name,score,vector&q={!vp f=vector vector="-0.88476306,1.1085405,0.94160986,-0.48293075,-1.337958,0.5753042,-0.21456252,1.0564083,0.6726484,-0.35901925,1.0844374,-2.2985933,-0.40325913,-0.39699218,1.4153827,2.4239159,-1.9437176,-0.6792026,0.89455646,0.35632563,-0.53838384,-0.45722723,0.1247566,0.85032016,0.33900392,-1.5361294,-0.47051978,0.9527769,-0.75922817,-1.2553006,0.10141397,1.4181198,0.24027266,0.17540579,-2.5858777,-1.2593254,-0.5828236,0.86464196,0.38927016,0.03427886,0.6351769,-1.7710632,-0.10303191,-0.44449517,1.8008639,-1.1259017,-0.5516233,-1.206171,0.25794813,-0.20387788,0.07476187,0.22883685,-1.0614568,0.26944685,-0.5347677,-0.8742363,-0.535749,0.38284862,1.0968019,0.02051202,0.62514746,0.50039554,0.49552372,-1.0150254,0.9231687,-1.3622354,-0.38200936,0.40629116,0.61071736,0.88147354,0.07021571,-1.5498863,-0.9278682,0.00889979,1.0004257,-0.26416442,-1.0365705,0.6484249,1.0995578,-0.7833352,-0.5397713,1.3674759,-0.86755306,0.10816425,-0.8117695,-0.9115286,0.5814933,-1.5243529,-0.04133647,-0.36087885,-0.71196574,0.43688157,0.77339274,1.8671757,-1.6530803,2.452878,-0.3257662,-0.28955212,-1.2798327,-0.5864698,0.04978422,0.6750865,0.72302884,1.4278954,-1.9902716,-3.9406059,-0.34338212,1.2116548,1.6713749,-0.5804129,0.23183243,0.72133154,-0.62925905,0.4265586,0.90001315,1.2792016,0.09643922,1.2660303,0.96224874,-0.37483835,0.12344939,-2.8622265,-1.2999626,0.04293389,-0.36390737,1.0422441,0.67031693,1.9671665"}
@xeisberg no worries at all. I wonder, is there any particular reason, why you used https://github.com/markhng525/solr-vector-scoring instead of https://github.com/DmitryKey/solr-vector-scoring ?
Can you consider testing https://github.com/DmitryKey/solr-vector-scoring/releases/tag/8.0.0 version? I'm trying to repro the issue on my side
@DmitryKey Thank you. I beg your pardon for not being clear in the last message. I tried with the Solr 6.6.0, 8.0.0, and 8.5.2 with their respective plugins hoping to solve the problem.
Anyway, I tried again now creating a new jar plug using mvn package from your link and used it with Solr 8.0.0 and the same error occurs as before. The beforementioned queries functions well in the browser so I wonder what the problem could be.
hi @xeisberg let's compare the python dependencies. Can you please run
pip freeze > requirements_freeze.txt
and then compare this file with the same file in git? https://github.com/DmitryKey/bert-solr-search/blob/feature/odfe-1M-indexing-optimization/requirements_freeze.txt
And the other thing -- can you compare your managed_schema and solrconfig.xml to these: https://github.com/DmitryKey/bert-solr-search/tree/feature/odfe-1M-indexing-optimization/solr_conf/8.0.0
Hello @DmitryKey . In the virtual environment I installed your requirements_freeze.txt so the only package not the same when comparing was :
53 | ipython | 7.17.0 | 7.16.1, could this be a problem?
The configs are 100% match.
Given that the query is working with http://localhost:8983/solr/vector_10/query?fl=name,score,vector&q={!vp f=vector [..], is it possible to query is some other way without going through streamlit? Of course this not fully related to this issue.
thanks for checking that!
How many dimensions do your vectors have? In my case I use 768 dim vectors, so the only way to query them is to use HTTP POST, because the character length of the query exceeds HTTP GET.
Btw, would you be open for a quick zoom call to show your vector search setup? It might be a little faster to figure things out. Ok, if not: we can also continue discussing here!
@DmitryKey I see. I am not sure how to check the dimensions of the vectors. I am using BERT uncased_L-4_H-128_A-2.
For sure. If you send me a link or another way of contacting you, I could join you. When would it suit you?
Edit: mistyped
@xeisberg I've sent you a zoom link for tomorrow, hope it works out for you.
Looking at your model name, I believe it is generating 128 dimensional vectors. You can double check that by looking inside the unziped directory with the model, look for bert_config.json and parameter name is hidden_size
.
@DmitryKey Thank you. Speak to you later today. Indeed, it has 128 dimensional vectors as specified in the config. Thank you for clarifying.
Thank you for the assistance @DmitryKey.
So in the end the problem was the src/search_demo_solr.py file
on line 122 : docs,query_time, numfound = sc.query("vector", query)
where one needs to replace "vector" with the core name in your solr index which may be determined by the amount of files that one indexes(in my case it became vector_10 for 10 files.
Edit: spelling
Dear @DmitryKey , it is a pleasure to follow your guides and go through the examples. Thank you for providing such interesting tools.
Concerning "Neural Search with BERT and Solr" everything seems to go alright including indexing and searching on the solr server, however, once I start streamlit the following errors occur when I search. Do you have any idea about what it may be?
Thank you for your consideration.