MI-DPLA / combine

Combine /kämˌbīn/ - Metadata Aggregator Platform
MIT License
26 stars 11 forks source link

Job lineage can be prohibitively slow #308

Closed ghukill closed 6 years ago

ghukill commented 6 years ago

Methods in question:

This is very noticeable when viewing Jobs table with all Jobs, or attempting to retrieve the lineage for a Job while the system is working hard.

ghukill commented 6 years ago

Traced back to why slower when Mongo I/O is high, checking for validation results for each Job, which pings Mongo's Record collection. From Job.get_lineage():

# get validation results for self  #
validation_results = self.validation_results()

Which is solely to determine if the Job is valid or not, to colorize the edges of the node. Is there another way to determine this? How does Jobs DT table infer validity for a Job?

ghukill commented 6 years ago

Even Jobs DT table uses the method validation_results() when looping through Jobs.

If not the only cause of the lineage slowdown, it certainly contributes. And, appears to be run twice each time a Record Group page is shown, for lineage and Jobs DT table.

Would probably make sense to save this calculation to Job.job_details at the tail end of a Job running, and then update if validations run or removed.

ghukill commented 6 years ago

Fixed. Storing validation results to validation_results, and detailed record count to detailed_record_count in Job.job_details.