MI-DPLA / combine

Combine /kämˌbīn/ - Metadata Aggregator Platform
MIT License
26 stars 11 forks source link

Apply Record filters to publish #285

Open ghukill opened 5 years ago

ghukill commented 5 years ago

When publishing a Job, allow filters to be applied. Because done entirely in Mongo, at the very minimum based on properties from Records:

This situation being an example where a third Merge/Duplicate Job is initiated just to weed out 20-30 Records to do not pass a validation scenario, and 49k Records are needlessly duplicated:

selection_199

ghukill commented 5 years ago

Could save these input filters under publish key of job_details:

Currently, contains:

"published": {
    "publish_set_id": "foo",
    "status": true
  },

Proposed:

"published": {
    "publish_set_id": "smokestack",
    "status": true,
    "input_filters": {
    "input_validity_valve": "valid", <-- note valid here
    "input_es_query_valve": null,
    "input_numerical_valve": 10,
    "filter_dupe_record_ids": true
  },