amazon-connect / amazon-connect-salesforce-lambda

Apache License 2.0
45 stars 34 forks source link

Postcall Redacted Recording Import failing (timeout) - Doing full scan on S3 'Analysis/Voice/Redacted' #59

Open jeve7 opened 6 months ago

jeve7 commented 6 months ago

I am working in a system that requires the postcall redacted recording to be uploaded to Salesforce. To do that, attributes "contactLensImportEnabled" and "postcallRedactedRecordingImportEnabled" are set to true. Following instructions from here we configured a trigger to the lambda named "sfProcessContactLens". The function was timing out even after change the time to 10 minutes. The root cause is because it's doing a full scan of the folder 'Analysis/Voice/Redacted' in the S3 bucket where all the calls are (Millions of objects). The lambda function is invoking the method "getRedactedRecordingLocation" inside the file "sfContactLensUtil.py" which is doing this (See here):

pages = paginator.paginate(Bucket=connectBucket, Prefix='Analysis/Voice/Redacted')
for page in pages:
  for obj in page['Contents']:
    if redactedRecordingKey in obj['Key'] and obj['Key'].endswith('.wav'):
      redactedRecordingLocation = connectBucket + '/' + obj['Key']
      return redactedRecordingLocation

Notice the hard-coded full scan on prefix "Analysis/Voice/Redacted". With millions of objects in that location the search is timing out before it gets the right file. As a temporary solution I changed the prefix to be 'Analysis/Voice/Redacted/2024' so it's limited to all the new calls which is a lot better but it must be fixed to get the object name from the trigger or using other mechanism instead of the full scan.