DiceTechJobs / SolrPlugins

Dice Solr Plugins from Simon Hughes Dice.com
http://www.dice.com
Apache License 2.0
87 stars 24 forks source link

Synonym expansion : Query problem #2

Open BLOIZO opened 5 years ago

BLOIZO commented 5 years ago

Hi @simonhughes22 , First of all, thank you for your work !

I'm new to solr and I'd be glad to get some help with this problem :

I've already stored and indexed a collection of texts, which I can query without trouble ; now I want to use synonym expansion thanks to the QueryBoostingQParserPlugin but, while I can see that my synonym-Field is loaded (cf "Analysis" of the UI), my queries fail to match synonym terms.

I don't know if the problem is caused by a wrong use of query-parameters (first time I try to use "queryboost")or if there's something wrong with my solrconfig or schema ... Could you help me ?

I can see that my synonyms are correctly mapped to each term and have a payload, but then I must have missed something...

Thanks a lot,

p-j-2

field&solrconfig.txt

simonhughes22 commented 5 years ago

It's hard to say what's wrong from the the screenshot and the config alone. The screenshot is referring to a field type of text_en_syn, which is not present in the field&solrconfg.txt AFAICT. I do see text_en_payload in there, which looks like it is doing the same thing. I would do a few things.

First I would use debugQuery=true to look at how the query is being processed by the request handler. Queries in solr can receive different processing at query time, depending on the query parser settings, before they get processed by the analyzer. The request handler will split the query on whitespace before it hits the analyzer unless you have a newer version of solr and have disabled that option. That can have other side effects though so be careful if you change that. Also, you need to make sure you are actually invoking the QueryBoostingQParserPlugin query parser by specifying the query parser to use when executing the query. Solr defaults to the lucene query parser, unless you specify edismax or some other parser.

Also, for query expansion use cases, you are probably better off doing that outside of solr in the API that's calling solr if possible, to avoid using custom plugins. Plugins can be useful in solving a lot of complex issues, but are also best avoided where possible as maintaining them and keeping them in sync with your solr version as it's upgraded can be a lot of work.

BLOIZO commented 5 years ago

Thanks for your answer @simonhughes22 , I just went one step further thanks to debugQuery : now I can see that the query is parsed with the correct parser, and that synonyms are retrieved !

Next problem though : the parsedquery seems to be abnormally long, and leading to 0 results... I'd be glad if you could keep helping me !

This is the screenshot of the debugQuery ("synonymes" is my synonym field, his fieldtype is "text_en_payload") screenshot

Thanks a lot for your time !

ps: I'm using solr8

simonhughes22 commented 5 years ago

I am not a solr 7 or 8 expert, I don't work with it as much these days. However, in terms of getting no results, I am guessing you are running the query as an AND query, where all terms are required. I'd just switching to an OR query, and using the mm parameter to enforce the number of terms you want to match. Can set to 1, it should still rank docs with more matches higher. Also more terms will impact performance, so you may want to either tinker with the mm settings or reduce the number of top terms queried, if performance becomes an issue.

BLOIZO commented 5 years ago

Hi @simonhughes22 ,

I still have issues with using "queryboost" properly, so I have some precise questions :

1) With my text field called "text" and my synonym field called "synonymes_payload", I'm unsuccessfully using the following query, does it seems correct to you ?

/select?df=text&q=achieve&qf=synonymes_payload&fl=id&wt=xml&hl=true&hl.snippets=300&hl.fl=text&hl.usePhraseHighlighter=true&defType=queryboost&q.op=OR&debugQuery=true&mm=1

I have the impression that a) It's possible that I don't interrogate fields properly and/or b) as you said, the parser may search for combinations of synonyms despite the "q.op=OR" parameter

2) Could you display for me the correct format of the files we are using (in your example, "jobs_titles.txt", "synonym_types.txt" and "top10_title_synonyms.txt") in order for me to be sure I didn't make some error at this level ?

3) Would you suggest I downgrade my solr version to the same one you were using, and if so which one is it ? (I'm using this for a pedagogical project, so I don't mind give this option a try if it's your advice).

Thanks a lot for your time and help !

simonhughes22 commented 5 years ago

@BLOIZO I would definitely start by downgrading it to the solr version I built it for. It was originally developed in 5.4 IIRC. The master branch is in 6.3 (check the pom.xml and look at the lucene library versions ). I don't currently work with Solr so I am not up to date on all the changes in 7 and 8, and some may be breaking to this functionality. At some point I'd like to add this to solr core, and issue a PR, but I've been too busy to take the time to do so. The built in MLT is not great.