Closed lintool closed 4 years ago
Originally planned to be part of v0.7.0 release, punting.
I've added the test case for Solr on robust04 and it works as expected. But there are some problems for Solr on passage and ES on core18.
sh target/appassembler/bin/SearchSolr -topicreader TsvString -solr.index msmarco-passage -solr.zkUrl localhost:9983 \
-topics src/main/resources/topics-and-qrels/topics.msmarco-passage.dev-subset.txt \
-output run.solr.msmarco-passage.txt
I got an exception org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://10.36.40.249:8983/solr/msmarco-passage: undefined field define
which is caused by the Solr queryio.anserini.search.SearchSolr.search(SearchSolr.java:211): QueryResponse response = client.query(args.solrIndex, solrq);
.
Just want to confirm if the above command looks okay? I will need to dig into the error if it looks correct.
k1=0.9, b=0.4
, but it gives me [FAILED] 0.2401 MAP, expected 0.2495 MAP
. Do you have any suggestion on this?@r-clancy can you help @x389liu out here?
@x389liu
For the first error, it's because Solr queries are of the form <field>:<query_terms>
and several of the queries from passage have define:
in them, with the colon being the culprit - Solr thinks define
is a field (which isn't in our schema).
ryan@thinkpad ~/sync/git/anserini [master] $ grep "define:" src/main/resources/topics-and-qrels/topics.msmarco-passage.dev-subset.txt [09:45:37]
129491 define: dog's life
129517 define: entity's
129641 define: morbid obesity
129684 define: portrait
129792 define: systemic
129837 define: wrongful prosecution
999921 define: shore up
1000083 define: precipitous delivery
1001397 define: barrage
1001903 define: (cancelling)
129565 define: homologate
I think some additional query cleaning here would do the trick.
For the second, nothing jumps out at me. I'd print out the queries sent to Elastic (in SearchElastic
) and compare the queries sent to Lucene (in SearchCollection
) and make sure they exactly the same, it could be a query sanitization issue. I'd also print out the document scores returned by each and compare. Relevant code is here and here.
Closed by #1030
Ref: #811
Currently, the Solr integration test script works only on Core18. The ES integration test script works on Robust04 and MS MARCO passage. Let's bring these into alignment - make both Solr/ES work on {Core18, Robust04, and MS MARCO passage}.