Closed asjohnston-asf closed 2 years ago
Two possible resolutions come to mind:
Refactor handle_subscription
to submit jobs granule-by-granule instead of subscription-by-subscription, something like:
def handle_subscription(subscription):
for granule in granules = get_unprocessed_granules(subscription):
jobs = get_jobs_for_granule(subscription, granule)
dynamo.jobs.put_jobs(subscription['user_id'], jobs, fail_when_over_quota=False)
#TODO exit loop if user has hit their quota?
Refactor get_jobs_for_subscription
to fetch at most, say, 20 granules per subscription per run to cap the amount of time spent processing any one subscription:
def get_jobs_for_subscription(subscription, limit=20):
granules = get_unprocessed_granules(subscription)
jobs = []
for granule in granules[:limit]:
jobs.extend(get_jobs_for_granule(subscription, granule))
return jobs
When evaluating subscriptions, the ProcessNewGranules lambda prepares all of the unprocessed jobs for each subscription before submitting any of those jobs:
https://github.com/ASFHyP3/hyp3/blob/develop/apps/process-new-granules/src/process_new_granules.py#L66-L69
This becomes problematic for new subscriptions, especially InSAR subscriptions with search dates reaching into the past, that may have many unprocessed granules. Preparing the list of unprocessed jobs involves finding InSAR pairs for each unprocessed granule, which may take longer than the Lambda function's 15 minute timeout. In these cases the lambda execution is terminated without submitting any new jobs. Subsequent runs of the function end up in the same state, typically requiring manual intervention before any new jobs will be submitted for any subscriptions.