MI-DPLA / combine

Combine /kämˌbīn/ - Metadata Aggregator Platform
MIT License
26 stars 11 forks source link

Add Record validity to DB model? #141

Closed ghukill closed 6 years ago

ghukill commented 6 years ago

It' a bit unfortunate that signs point to this now, but perhaps better late than never...

Should Record's have a validity column in the DB?

Was just looking at some options for filtering all / valid / invalid records in a Job's details page, on the heels of some work to select Record validity from an input Job, and it's becoming clear that if a Record's validity was saved in the DB, it would make anything related to validity much easier to handle.

Currently, it is not. There is a separate DB table, core_recordvalidation that stores validation failures, with a FK pointing to the Record. This is worth keeping, as it allows details about each Validation Scenario failure.

But often, we want to alter views, input jobs, etc., based on the boolean of whether or not a Record has failed any Validation Scenarios.

One possible approach might be to write this boolean column at the tail-end of a Spark job, using some of the code from filtering input jobs for all / valid / invalid (but eventually obviating that). Would be extraordinarily expensive, but would allow for considerable facility on the other side.

ghukill commented 6 years ago

Related #139

ghukill commented 6 years ago

Added new column to Record table, valid. In many instances, it is still favorable to use the JobValidation records, even the counts can be quicker. But for instances where you need a column to sort on -- e.g. the Records table in some views -- this column is indispensable.

This also speeds up, and simplifies, the record validity valve for Spark jobs.

Done.