NCEAS / metacat

Data repository software that helps researchers preserve, share, and discover data
https://knb.ecoinformatics.org/software/metacat
GNU General Public License v2.0
25 stars 12 forks source link

Restructure collectionQuery for `id` filters #1398

Closed laurenwalker closed 4 years ago

laurenwalker commented 4 years ago

In yesterday's meeting while we reviewed the portal UI, we realized that we should not AND the -obsoletedBy:* filter with the id filters, since we need to allow people to collect obsoleted objects in their collections.

For example, someone may want to create a collection of the exact versions of datasets they used for their paper. Those versions may be obsoleted now or in the future. We don't want those datasets to be removed/excluded from the collection just because there is a newer version.

Here is a revised description of the collectionQuery, based on the original ticket: https://github.com/NCEAS/metacat/issues/1378#issuecomment-525894013

The query is essentially broken up into 4 groups:

Filters in Group1 and Group4 are OR'ed together. Filters in Group2 and Group3 are ANDed together.

Group1 and Group2 are OR'ed together. Group1 and Group2 are grouped together and ANDed with group3. Group4 is finally OR'ed to the rest of the query.

Example: (((group1) OR (group2)) AND (group3)) OR (group4)

(((isPartOf:x OR seriesId:z) OR (anyOtherField:zz)) AND (formatType:METADATA AND -obsoletedBy:*)) OR (id:y OR id:zz)

The reason behind this query structure is that the collection should be made up of the following:

gothub commented 4 years ago

The restructuring of the id field in collectionQuery was completed in commit 177c6b5effe1a83f0c0914b3daa976eab07c5627

An additional commit added a default operator - "AND" - in case the operator element was not included in the filter definition. If there is not a default operator and multiple terms are produced by a filter, then a Solr syntax error will result.

For example, this filter does not have an <operator> defined:

        <filter>
            <field>keywords</field>
            <field>attribute</field>
            <value>soil</value>
        </filter>

but with the default "AND", this query term will be produced:

(keywords:soil AND attribute:soil) 
gothub commented 4 years ago

Work on processing the collectionQuery field via PortalDocumentProcessor was completed in commit 9fc865c269282f0a07963ce0dbc088ac7dcdc0f6