DmitryKey / bert-solr-search

Search with BERT vectors in Solr, Elasticsearch, OpenSearch and GSI APU
Apache License 2.0
164 stars 31 forks source link

Streamlit JSONDecodeError: Expecting value: line 1 column 1 (char 0) #14

Closed xeisberg closed 3 years ago

xeisberg commented 3 years ago

Dear @DmitryKey , it is a pleasure to follow your guides and go through the examples. Thank you for providing such interesting tools.

Concerning "Neural Search with BERT and Solr" everything seems to go alright including indexing and searching on the solr server, however, once I start streamlit the following errors occur when I search. Do you have any idea about what it may be?

JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Traceback:

File "/home/n/documents/my_jina_app/baas/env/lib/python3.6/site-packages/streamlit/script_runner.py", line 337, in _run_script
    exec(code, module.__dict__)
File "/home/n/documents/my_jina_app/baas/bert-solr-search/src/search_demo_solr.py", line 122, in <module>
    docs, query_time, numfound = sc.query("vector", query)
File "src/client/solr_client.py", line 102, in query
    resp = resp.json()
File "/home/n/documents/my_jina_app/baas/env/lib/python3.6/site-packages/requests/models.py", line 898, in json
    return complexjson.loads(self.text, **kwargs)
File "/home/n/anaconda3/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
File "/home/n/anaconda3/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/n/anaconda3/lib/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None

Thank you for your consideration.

DmitryKey commented 3 years ago

Hi @xeisberg Thank you for kind words and glad to hear that the tools are interesting to you.

For the error above -- do you see any errors in Solr's logs? Does the index exist and it contains the vector field?

Can you please paste the screenshot of the streamlit UI, since it has few params -- so it will be easier to reproduce on my side.

xeisberg commented 3 years ago

Dear @DmitryKey . Thank you for taking the time to consider my issue. I should also mention that I am using solr 8.5.2 with this plugin version https://github.com/markhng525/solr-vector-scoring

I did try solr 6.6.0 and 8.8.0 following your instructions and ended up with the same error while using streamlit although indexing worked fine. I have not saved the error logs but can quickly do so as I have kept the folders intact and easy to run.

solr logging looks like this: image

I think the indexing is working fine as can be seen in this screenshot with the vector:

image However, if I run "defType=vp" for the vector function I get the following error: image { "responseHeader":{ "status":500, "QTime":0, "params":{ "q":"*:*", "defType":"vp", "wt":"json", "debugQuery":"on", "_":"1620351183292"}}, "error":{ "trace":"java.lang.NullPointerException\n\tat com.github.saaay71.solr.query.VectorQParserPlugin$1.parse(VectorQParserPlugin.java:34)\n\tat org.apache.solr.search.QParser.getQuery(QParser.java:174)\n\tat org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:161)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:309)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)\n\tat org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:579)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:500)\n\tat org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)\n\tat org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:388)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)\n\tat java.lang.Thread.run(Thread.java:748)\n", "code":500}}

Here are some screenshots of the streamlit UI. It does not work with any of the parameter settings that I have tried.

image image image Edit: I did a try on the 8.0.0 version as well. Seems to be a different kind of error

Solr: image

image

`

Error 500 Server Error

HTTP ERROR 500

Problem accessing /solr/vector_10/select. Reason:

    Server Error

Caused by:

java.lang.NoClassDefFoundError: org/apache/lucene/queries/CustomScoreQuery
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:817)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
    at com.github.saaay71.solr.VectorQParserPlugin.createParser(VectorQParserPlugin.java:16)
    at org.apache.solr.search.QParser.getParser(QParser.java:367)
    at org.apache.solr.search.QParser.getParser(QParser.java:319)
    at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:157)
    at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:272)
    at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:2559)
    at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:711)
    at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:394)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:340)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
    at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
    at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    at org.eclipse.jetty.server.Server.handle(Server.java:502)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
    at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.apache.lucene.queries.CustomScoreQuery
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:817)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
    ... 57 more

Caused by:

java.lang.ClassNotFoundException: org.apache.lucene.queries.CustomScoreQuery
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:817)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:817)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
    at com.github.saaay71.solr.VectorQParserPlugin.createParser(VectorQParserPlugin.java:16)
    at org.apache.solr.search.QParser.getParser(QParser.java:367)
    at org.apache.solr.search.QParser.getParser(QParser.java:319)
    at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:157)
    at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:272)
    at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:2559)
    at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:711)
    at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:394)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:340)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
    at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
    at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    at org.eclipse.jetty.server.Server.handle(Server.java:502)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
    at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
    at java.lang.Thread.run(Thread.java:748)

`

Logging: image Streamlit: image `JSONDecodeError: Expecting value: line 1 column 1 (char 0) Traceback:

File "/home/n/documents/my_jina_app/baas/env/lib/python3.6/site-packages/streamlit/script_runner.py", line 337, in _run_script exec(code, module.dict) File "/home/n/documents/my_jina_app/baas/abert/src/search_demo_solr.py", line 122, in docs, query_time, numfound = sc.query("vector", query) File "src/client/solr_client.py", line 102, in query resp = resp.json() File "/home/n/documents/my_jina_app/baas/env/lib/python3.6/site-packages/requests/models.py", line 898, in json return complexjson.loads(self.text, **kwargs) File "/home/n/anaconda3/lib/python3.6/json/init.py", line 354, in loads return _default_decoder.decode(s) File "/home/n/anaconda3/lib/python3.6/json/decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/home/n/anaconda3/lib/python3.6/json/decoder.py", line 357, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None`

DmitryKey commented 3 years ago

hey @xeisberg it occurred to me: what version of the vector jar are you using? It might be you need to compile and configure the 8.x version: https://github.com/DmitryKey/solr-vector-scoring/releases/tag/8.0.0 Did you already try it?

Also https://github.com/DmitryKey/solr-vector-scoring has the instruction for sending the POST request to Solr to find similar vectors:

{!vp f=vector vector="0.1,4.75,0.3,1.2,0.7,4.0"}

So you can first try this approach using curl or Postman for instance -- and if that works, then turn to the streamlit app.

Also, are using the master branch?

xeisberg commented 3 years ago

Hello @DmitryKey . I tried using the jar file from the master branch later on the 8.5.2 version and your 8.0.0 by creating it with mvn package .

Those queries seem to work fine; I tried with different values and it works.

image

image

Do you mean the master branch of the solr vector scoring?

The streamlit error remains:

` JSONDecodeError: Expecting value: line 1 column 1 (char 0) Traceback:

File "/home/n/documents/my_jina_app/baas/env/lib/python3.6/site-packages/streamlit/script_runner.py", line 337, in _run_script exec(code, module.dict) File "/home/n/documents/my_jina_app/baas/cbert/src/search_demo_solr.py", line 122, in docs, query_time, numfound = sc.query("vector", query) File "src/client/solr_client.py", line 102, in query resp = resp.json() File "/home/n/documents/my_jina_app/baas/env/lib/python3.6/site-packages/requests/models.py", line 898, in json return complexjson.loads(self.text, **kwargs) File "/home/n/anaconda3/lib/python3.6/json/init.py", line 354, in loads return _default_decoder.decode(s) File "/home/n/anaconda3/lib/python3.6/json/decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/home/n/anaconda3/lib/python3.6/json/decoder.py", line 357, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None `

By the way, the vectors look a bit different in the solr admin page:

image

I beg your pardon for all the troubles and thank you for your assistance.

Edit: I also tried to query [language] with vectors created from BaS and it worked well

http://localhost:8983/solr/vector_10/query?fl=name,score,vector&q={!vp f=vector vector="-0.88476306,1.1085405,0.94160986,-0.48293075,-1.337958,0.5753042,-0.21456252,1.0564083,0.6726484,-0.35901925,1.0844374,-2.2985933,-0.40325913,-0.39699218,1.4153827,2.4239159,-1.9437176,-0.6792026,0.89455646,0.35632563,-0.53838384,-0.45722723,0.1247566,0.85032016,0.33900392,-1.5361294,-0.47051978,0.9527769,-0.75922817,-1.2553006,0.10141397,1.4181198,0.24027266,0.17540579,-2.5858777,-1.2593254,-0.5828236,0.86464196,0.38927016,0.03427886,0.6351769,-1.7710632,-0.10303191,-0.44449517,1.8008639,-1.1259017,-0.5516233,-1.206171,0.25794813,-0.20387788,0.07476187,0.22883685,-1.0614568,0.26944685,-0.5347677,-0.8742363,-0.535749,0.38284862,1.0968019,0.02051202,0.62514746,0.50039554,0.49552372,-1.0150254,0.9231687,-1.3622354,-0.38200936,0.40629116,0.61071736,0.88147354,0.07021571,-1.5498863,-0.9278682,0.00889979,1.0004257,-0.26416442,-1.0365705,0.6484249,1.0995578,-0.7833352,-0.5397713,1.3674759,-0.86755306,0.10816425,-0.8117695,-0.9115286,0.5814933,-1.5243529,-0.04133647,-0.36087885,-0.71196574,0.43688157,0.77339274,1.8671757,-1.6530803,2.452878,-0.3257662,-0.28955212,-1.2798327,-0.5864698,0.04978422,0.6750865,0.72302884,1.4278954,-1.9902716,-3.9406059,-0.34338212,1.2116548,1.6713749,-0.5804129,0.23183243,0.72133154,-0.62925905,0.4265586,0.90001315,1.2792016,0.09643922,1.2660303,0.96224874,-0.37483835,0.12344939,-2.8622265,-1.2999626,0.04293389,-0.36390737,1.0422441,0.67031693,1.9671665"} image

DmitryKey commented 3 years ago

@xeisberg no worries at all. I wonder, is there any particular reason, why you used https://github.com/markhng525/solr-vector-scoring instead of https://github.com/DmitryKey/solr-vector-scoring ?

DmitryKey commented 3 years ago

Can you consider testing https://github.com/DmitryKey/solr-vector-scoring/releases/tag/8.0.0 version? I'm trying to repro the issue on my side

xeisberg commented 3 years ago

@DmitryKey Thank you. I beg your pardon for not being clear in the last message. I tried with the Solr 6.6.0, 8.0.0, and 8.5.2 with their respective plugins hoping to solve the problem.

Anyway, I tried again now creating a new jar plug using mvn package from your link and used it with Solr 8.0.0 and the same error occurs as before. The beforementioned queries functions well in the browser so I wonder what the problem could be.

DmitryKey commented 3 years ago

hi @xeisberg let's compare the python dependencies. Can you please run pip freeze > requirements_freeze.txt and then compare this file with the same file in git? https://github.com/DmitryKey/bert-solr-search/blob/feature/odfe-1M-indexing-optimization/requirements_freeze.txt

And the other thing -- can you compare your managed_schema and solrconfig.xml to these: https://github.com/DmitryKey/bert-solr-search/tree/feature/odfe-1M-indexing-optimization/solr_conf/8.0.0

xeisberg commented 3 years ago

Hello @DmitryKey . In the virtual environment I installed your requirements_freeze.txt so the only package not the same when comparing was :

53 | ipython | 7.17.0 | 7.16.1, could this be a problem?

The configs are 100% match.

Given that the query is working with http://localhost:8983/solr/vector_10/query?fl=name,score,vector&q={!vp f=vector [..], is it possible to query is some other way without going through streamlit? Of course this not fully related to this issue.

DmitryKey commented 3 years ago

thanks for checking that!

How many dimensions do your vectors have? In my case I use 768 dim vectors, so the only way to query them is to use HTTP POST, because the character length of the query exceeds HTTP GET.

Btw, would you be open for a quick zoom call to show your vector search setup? It might be a little faster to figure things out. Ok, if not: we can also continue discussing here!

xeisberg commented 3 years ago

@DmitryKey I see. I am not sure how to check the dimensions of the vectors. I am using BERT uncased_L-4_H-128_A-2.

For sure. If you send me a link or another way of contacting you, I could join you. When would it suit you?

Edit: mistyped

DmitryKey commented 3 years ago

@xeisberg I've sent you a zoom link for tomorrow, hope it works out for you.

DmitryKey commented 3 years ago

Looking at your model name, I believe it is generating 128 dimensional vectors. You can double check that by looking inside the unziped directory with the model, look for bert_config.json and parameter name is hidden_size.

xeisberg commented 3 years ago

@DmitryKey Thank you. Speak to you later today. Indeed, it has 128 dimensional vectors as specified in the config. Thank you for clarifying.

xeisberg commented 3 years ago

Thank you for the assistance @DmitryKey.

So in the end the problem was the src/search_demo_solr.py file on line 122 : docs,query_time, numfound = sc.query("vector", query) where one needs to replace "vector" with the core name in your solr index which may be determined by the amount of files that one indexes(in my case it became vector_10 for 10 files.

Edit: spelling