eclipse-pass / pass-deposit-services

Deposit Services are responsible for the transfer of custodial content and metadata from end users to repositories.
Apache License 2.0
1 stars 4 forks source link

Update the Submission.submissionStatus of candidate Submissions from a Quartz job #203

Closed emetsger closed 5 years ago

emetsger commented 5 years ago

Moves the logic out of the DepositProcessor to a dedicated Quartz job.

Resolves https://github.com/OA-PASS/general-issues/issues/117

karenhanson commented 5 years ago

Question: In PASS right now there are ~920 submission with a status of submitted and ~280 with a status of needs-attention. Would this attempt to update the status of every one of those every time it runs? If so that may be too much - we may need to wait until we can put some date filters on ES queries...

emetsger commented 5 years ago

@karenhanson yes, all of those Submissions would be updated every time the job runs.

The quartz job runs every 10 minutes and the jobs cannot run concurrently. So it will be OK if a job takes longer than 10 minutes to run.

If the concern is spamming Fedora with requests, I can build in a pause between each query to the submission status service.

We could wait, but unfortunately I think this needs to be addressed shortly.

karenhanson commented 5 years ago

My concern is the effect on performance. I'm not sure if running several thousand queries in quick succession every 10 minutes would have a noticeable impact or not. If it would, It might be worth moving forward with the change to add Fedora dates to Elasticsearch records (which I think needs to be done sooner or later) so that we could update based on recent updates on RepositoryCopy or Deposit records instead.

emetsger commented 5 years ago

@karenhanson well, there's enough logging now in the updater where we can measure the performance impact in terms of the time it takes to run the job. So I'd say we can merge this (when the build succeeds) and evaluate in demo.

karenhanson commented 5 years ago

Sounds good! Perhaps we can switch things around when those dates are available.

emetsger commented 5 years ago

@karenhanson yeah a better approach may be to use the ES query dsl directly. either way, dates will will help reduce the number of entities to scan.

emetsger commented 5 years ago

This PR uncovered an issue where the indexer was not properly communicating with Fedora, so there are a series of commits (labeled IT fixes) included in this PR which fix that.