clarin-eric / VLO

Virtual Language Observatory
GNU General Public License v3.0
14 stars 6 forks source link

OR queries only work within parentheses #129

Closed twagoo closed 6 years ago

twagoo commented 6 years ago

Compare

(Note: there is also Dutch or German which takes 'or' as a phrase and currently results in 394 records)

The OR operator should always be supported and not just if the parentheses are present. Investigate why and fix in the Solr configuration if possible!

twagoo commented 6 years ago

Solr "debugQuery" output:

German OR Dutch

    "parsedquery":"+(DisjunctionMaxQuery(((continent:German)^0.5 | (country:German)^2.0 | modality:German | (keywords:German)^2.0 | (subject:German)^2.0 | (description:german)^4.0 | (organisation:German)^2.0 | collection:German | (_languageName:german)^2.0 | (name:german)^8.0 | genre:German | text:german | (id:German)^0.1)) DisjunctionMaxQuery(((continent:Dutch)^0.5 | (country:Dutch)^2.0 | modality:Dutch | (keywords:Dutch)^2.0 | (subject:Dutch)^2.0 | (description:dutch)^4.0 | (organisation:Dutch)^2.0 | collection:Dutch | (_languageName:dutch)^2.0 | (name:dutch)^8.0 | genre:Dutch | text:dutch | (id:Dutch)^0.1)))~2 (+DisjunctionMaxQuery(((name:\"german dutch\")^2.0 | description:\"german dutch\"))) (+SolrRangeQuery(name:{* TO *})^2.0) (+SolrRangeQuery(description:{* TO *})) (SolrRangeQuery(_hasPart:{* TO *}) SolrRangeQuery(_resourceRef:{* TO *})) (+(+availability:PUB -availability:ACA -availability:RES -availability:UNSPECIFIED)^0.5) (+(+availability:ACA -availability:PUB -availability:RES -availability:UNSPECIFIED)^0.2) FunctionQuery(double(_hasPartCountWeight))^0.2 FunctionQuery(rord(_daysSinceLastSeen))^0.05 FunctionQuery(rord(_hierarchyWeight))",
    "parsedquery_toString":"+((((continent:German)^0.5 | (country:German)^2.0 | modality:German | (keywords:German)^2.0 | (subject:German)^2.0 | (description:german)^4.0 | (organisation:German)^2.0 | collection:German | (_languageName:german)^2.0 | (name:german)^8.0 | genre:German | text:german | (id:German)^0.1) ((continent:Dutch)^0.5 | (country:Dutch)^2.0 | modality:Dutch | (keywords:Dutch)^2.0 | (subject:Dutch)^2.0 | (description:dutch)^4.0 | (organisation:Dutch)^2.0 | collection:Dutch | (_languageName:dutch)^2.0 | (name:dutch)^8.0 | genre:Dutch | text:dutch | (id:Dutch)^0.1))~2) (+((name:\"german dutch\")^2.0 | description:\"german dutch\")) (+(name:{* TO *})^2.0) (+description:{* TO *}) (_hasPart:{* TO *} _resourceRef:{* TO *}) (+(+availability:PUB -availability:ACA -availability:RES -availability:UNSPECIFIED)^0.5) (+(+availability:ACA -availability:PUB -availability:RES -availability:UNSPECIFIED)^0.2) (double(_hasPartCountWeight))^0.2 (rord(_daysSinceLastSeen))^0.05 rord(_hierarchyWeight)",

(German OR Dutch)

    "parsedquery":"+(+(DisjunctionMaxQuery(((continent:German)^0.5 | (country:German)^2.0 | modality:German | (keywords:German)^2.0 | (subject:German)^2.0 | (description:german)^4.0 | (organisation:German)^2.0 | collection:German | (_languageName:german)^2.0 | (name:german)^8.0 | genre:German | text:german | (id:German)^0.1)) DisjunctionMaxQuery(((continent:Dutch)^0.5 | (country:Dutch)^2.0 | modality:Dutch | (keywords:Dutch)^2.0 | (subject:Dutch)^2.0 | (description:dutch)^4.0 | (organisation:Dutch)^2.0 | collection:Dutch | (_languageName:dutch)^2.0 | (name:dutch)^8.0 | genre:Dutch | text:dutch | (id:Dutch)^0.1)))) (+DisjunctionMaxQuery(((name:\"german dutch\")^2.0 | description:\"german dutch\"))) (+SolrRangeQuery(name:{* TO *})^2.0) (+SolrRangeQuery(description:{* TO *})) (SolrRangeQuery(_hasPart:{* TO *}) SolrRangeQuery(_resourceRef:{* TO *})) (+(+availability:PUB -availability:ACA -availability:RES -availability:UNSPECIFIED)^0.5) (+(+availability:ACA -availability:PUB -availability:RES -availability:UNSPECIFIED)^0.2) FunctionQuery(double(_hasPartCountWeight))^0.2 FunctionQuery(rord(_daysSinceLastSeen))^0.05 FunctionQuery(rord(_hierarchyWeight))",
    "parsedquery_toString":"+(+(((continent:German)^0.5 | (country:German)^2.0 | modality:German | (keywords:German)^2.0 | (subject:German)^2.0 | (description:german)^4.0 | (organisation:German)^2.0 | collection:German | (_languageName:german)^2.0 | (name:german)^8.0 | genre:German | text:german | (id:German)^0.1) ((continent:Dutch)^0.5 | (country:Dutch)^2.0 | modality:Dutch | (keywords:Dutch)^2.0 | (subject:Dutch)^2.0 | (description:dutch)^4.0 | (organisation:Dutch)^2.0 | collection:Dutch | (_languageName:dutch)^2.0 | (name:dutch)^8.0 | genre:Dutch | text:dutch | (id:Dutch)^0.1))) (+((name:\"german dutch\")^2.0 | description:\"german dutch\")) (+(name:{* TO *})^2.0) (+description:{* TO *}) (_hasPart:{* TO *} _resourceRef:{* TO *}) (+(+availability:PUB -availability:ACA -availability:RES -availability:UNSPECIFIED)^0.5) (+(+availability:ACA -availability:PUB -availability:RES -availability:UNSPECIFIED)^0.2) (double(_hasPartCountWeight))^0.2 (rord(_daysSinceLastSeen))^0.05 rord(_hierarchyWeight)",
twagoo commented 6 years ago

The solution appears to be to remove the mm parameter in solrconfig.xml:

<str name="mm">100%</str>

Just having <str name="q.op">AND</str> induces the desired behaviour, while seting mm this way apparently forces a default "all must match" behaviour that can only be overridden by grouping the full clause.

This article seems to have some helpful background information, mostly based on this SOLR issue thread.

twagoo commented 6 years ago

After patching alpha as described above, the following yield the same result count:

Dutch OR German:

    "parsedquery":"+(DisjunctionMaxQuery(((continent:Dutch)^0.5 | (country:Dutch)^2.0 | modality:Dutch | (keywords:Dutch)^2.0 | (subject:Dutch)^2.0 | (description:dutch)^4.0 | (organisation:Dutch)^2.0 | collection:Dutch | (_languageName:dutch)^2.0 | (name:dutch)^8.0 | genre:Dutch | text:dutch | (id:Dutch)^0.1)) DisjunctionMaxQuery(((continent:German)^0.5 | (country:German)^2.0 | modality:German | (keywords:German)^2.0 | (subject:German)^2.0 | (description:german)^4.0 | (organisation:German)^2.0 | collection:German | (_languageName:german)^2.0 | (name:german)^8.0 | genre:German | text:german | (id:German)^0.1))) (+DisjunctionMaxQuery(((name:\"dutch german\")^2.0 | description:\"dutch german\"))) (+SolrRangeQuery(name:{* TO *})^2.0) (+SolrRangeQuery(description:{* TO *})) (SolrRangeQuery(_hasPart:{* TO *}) SolrRangeQuery(_resourceRef:{* TO *})) (+(+availability:PUB -availability:ACA -availability:RES -availability:UNSPECIFIED)^0.5) (+(+availability:ACA -availability:PUB -availability:RES -availability:UNSPECIFIED)^0.2) FunctionQuery(double(_hasPartCountWeight))^0.2 FunctionQuery(rord(_daysSinceLastSeen))^0.05 FunctionQuery(rord(_hierarchyWeight))",
    "parsedquery_toString":"+(((continent:Dutch)^0.5 | (country:Dutch)^2.0 | modality:Dutch | (keywords:Dutch)^2.0 | (subject:Dutch)^2.0 | (description:dutch)^4.0 | (organisation:Dutch)^2.0 | collection:Dutch | (_languageName:dutch)^2.0 | (name:dutch)^8.0 | genre:Dutch | text:dutch | (id:Dutch)^0.1) ((continent:German)^0.5 | (country:German)^2.0 | modality:German | (keywords:German)^2.0 | (subject:German)^2.0 | (description:german)^4.0 | (organisation:German)^2.0 | collection:German | (_languageName:german)^2.0 | (name:german)^8.0 | genre:German | text:german | (id:German)^0.1)) (+((name:\"dutch german\")^2.0 | description:\"dutch german\")) (+(name:{* TO *})^2.0) (+description:{* TO *}) (_hasPart:{* TO *} _resourceRef:{* TO *}) (+(+availability:PUB -availability:ACA -availability:RES -availability:UNSPECIFIED)^0.5) (+(+availability:ACA -availability:PUB -availability:RES -availability:UNSPECIFIED)^0.2) (double(_hasPartCountWeight))^0.2 (rord(_daysSinceLastSeen))^0.05 rord(_hierarchyWeight)",

(Dutch OR German):

    "parsedquery":"+(+(DisjunctionMaxQuery(((continent:Dutch)^0.5 | (country:Dutch)^2.0 | modality:Dutch | (keywords:Dutch)^2.0 | (subject:Dutch)^2.0 | (description:dutch)^4.0 | (organisation:Dutch)^2.0 | collection:Dutch | (_languageName:dutch)^2.0 | (name:dutch)^8.0 | genre:Dutch | text:dutch | (id:Dutch)^0.1)) DisjunctionMaxQuery(((continent:German)^0.5 | (country:German)^2.0 | modality:German | (keywords:German)^2.0 | (subject:German)^2.0 | (description:german)^4.0 | (organisation:German)^2.0 | collection:German | (_languageName:german)^2.0 | (name:german)^8.0 | genre:German | text:german | (id:German)^0.1)))) (+DisjunctionMaxQuery(((name:\"dutch german\")^2.0 | description:\"dutch german\"))) (+SolrRangeQuery(name:{* TO *})^2.0) (+SolrRangeQuery(description:{* TO *})) (SolrRangeQuery(_hasPart:{* TO *}) SolrRangeQuery(_resourceRef:{* TO *})) (+(+availability:PUB -availability:ACA -availability:RES -availability:UNSPECIFIED)^0.5) (+(+availability:ACA -availability:PUB -availability:RES -availability:UNSPECIFIED)^0.2) FunctionQuery(double(_hasPartCountWeight))^0.2 FunctionQuery(rord(_daysSinceLastSeen))^0.05 FunctionQuery(rord(_hierarchyWeight))",
    "parsedquery_toString":"+(+(((continent:Dutch)^0.5 | (country:Dutch)^2.0 | modality:Dutch | (keywords:Dutch)^2.0 | (subject:Dutch)^2.0 | (description:dutch)^4.0 | (organisation:Dutch)^2.0 | collection:Dutch | (_languageName:dutch)^2.0 | (name:dutch)^8.0 | genre:Dutch | text:dutch | (id:Dutch)^0.1) ((continent:German)^0.5 | (country:German)^2.0 | modality:German | (keywords:German)^2.0 | (subject:German)^2.0 | (description:german)^4.0 | (organisation:German)^2.0 | collection:German | (_languageName:german)^2.0 | (name:german)^8.0 | genre:German | text:german | (id:German)^0.1))) (+((name:\"dutch german\")^2.0 | description:\"dutch german\")) (+(name:{* TO *})^2.0) (+description:{* TO *}) (_hasPart:{* TO *} _resourceRef:{* TO *}) (+(+availability:PUB -availability:ACA -availability:RES -availability:UNSPECIFIED)^0.5) (+(+availability:ACA -availability:PUB -availability:RES -availability:UNSPECIFIED)^0.2) (double(_hasPartCountWeight))^0.2 (rord(_daysSinceLastSeen))^0.05 rord(_hierarchyWeight)",
twagoo commented 6 years ago

Note: this is a good use case for #125

twagoo commented 6 years ago

Tests on alpha and production confirm that issue has been fixed. Will be included in VLO 4.3.3 (RC1 already tagged and deployed to alpha)