aws-samples / aws-serverless-connect-wallboard

Sample code for building a serverless wallboard for Amazon Connect.
MIT No Attribution
29 stars 16 forks source link

Historical wallboard import #11

Open CBVTorrijos opened 1 year ago

CBVTorrijos commented 1 year ago

Hello,

We have noticed inconsistencies with the historical metric portion specifically with the API CONTACTS_QUEUED and CONTACTS_ABANDONED which is not showing the same values as the wallboard in connect instance.

Best,

Brettles commented 1 year ago

Are the numbers completely wrong? Or are they not up to date? The historical numbers gathered in the wallboard are only collected every minute so will be a little behind the numbers shown in the Connect admin interface.

CBVTorrijos commented 1 year ago

They're completely wrong and the weird thing is we see incorrect numbers just for some of the "Queues". For example we have 11 queues and 3 of the queues are showing incorrect numbers. One queue shows a difference of 18 and the others are 2 - 3 difference. The biggest gap we have is the "CONTACTS_IN_QUEUE" and "CONTACTS_ABANDONED"

CBVTorrijos commented 1 year ago

We just found the connect wallboard on one of the queue has no "Queue" but our wallboard pulling "CONTACTS_QUEUED" is showing a number as if they're not "Synced"

Correction on the above API CONTACTS_IN_QUEUE I mean to put CONTACTS_QUEUED as we are only experiencing Historical

Brettles commented 1 year ago

I done some testing in my environment here and I can't see where the different might be coming from. There's always some level of data lag with historical metrics but in general the dashboard in the Connect instance should be the same within seconds (minute at worst) to the wallboard.

What you can do is change line 205 in the GetHistoricalMetrics Lambda function to be Logger.setLevel(logging.DEBUG) - that will give you far more than we need; but the code will tell you what metrics it is storing in the table at what time - it might hint as to what is going on with the data.

Brettles commented 1 year ago

Also: If you're seeing completely wrong numbers that are kind of random - there was an issue in the wallboard render function that was fixed a while ago. If you have the old code it could be affecting you. So make sure you're using the latest version of that.

CBVTorrijos commented 1 year ago

What we are finding in ours is that it is not a latency issue but just incorrect information issue. I updated the GetHistoricalMetrics Lambda function as advised and got an error "gethistoricalmodule can't be found" I also verified it is the same wallboard render code is on Lambda as well.

I notice that our Abandoned on our wallboard and Abandoned on Connect is off by 2. However the our "Queued" CONTACTS_QUEUED is off only on some of the queues. For Example, Amazon connect would show 0 entered on a queue but our wallboard shows a number which is a little confusing. Please let me know if I need to elaborate further.

Brettles commented 1 year ago

The "gethistoricalmodule can't be found" is probably due to the handler being incorrect (somehow). In the runtime settings for the function (which is on the same page as the code) make sure that the "Handler" setting matches the name of the code - the default is lambda_function.lambda_handler but when deploying via CloudFormation (which this should have been) it will be get-historical-metrics.lambda_handler.

Background for the ask below:

With historical data, the Lambda function polls the API and puts the data (and only the requested data - not all metrics) into the DynamoDB table. The render function reads it from there. What I'm trying to figure out is where the problem is happening: Is the historical Lambda putting the wrong data; or is the render function doing something incorrectly. Hence the ask above on the debug information - because that tells us what the historical function is writing to the database.

If you have bandwidth and you're looking at the wallboard with wrong figures, you might try looking in the DynamoDB table for the data and seeing what it is at the same time. What you're looking for is an item where the Identifier column is Data and the RecordType column is (for example) "ContactsQueued" - or whatever you've called that particular piece of data. The Value column will then hold what should be displayed on the wallboard.

CBVTorrijos commented 1 year ago

Hi,

I checked the DynamoDB information and the numerical number that is on the table is the same as the wallboard we created however, some queues were showing incorrect information inside the table versus what is showing on the wallboard in Amazon Connect.

For example: Our wallboard is showing 19 and when scanning DynamoDB for this specific queue it's also showing 19 so it's displaying correctly however, the same information on the Amazon Connect wallboard is showing as 0. Does this mean that the historical lambda is putting the wrong data? This is only an issue on some of the queues not all.

Best,

CBVTorrijos commented 1 year ago

We are only using 2 Historical API's CONTACTS_QUEUED and CONTACTS_ABANDONED.

CONTACTS_QUEUED isn't show the correct information on some of our queues however, CONTACTS_ABANDONED on the same queues are showing 100% correct on all the queues.

Brettles commented 1 year ago

This sounds spookily like the issue I described above which was affecting render-wallboard.py - there were edge cases where data in the function was being used by reference instead of by value in Python (you can blame my Python skills here). It was easily fixed but it took a day of debugging to figure out why it was happening. Check your version of render-wallboard.py; you're looking for Line 118 to be LocalSettings = DefaultSettings.copy() - it's the copy() that is the key part there.

CBVTorrijos commented 1 year ago

got it. So we are using the same render-wallboard.py that you have recently updated. We do have line 118 to be LocalSettings = DefaultSettings.copy(). we have found that the CONTACTS_QUEUED is both "Queue" and "transferred" at least is how it seems to be calculating for us. However, we still have a queue that is picking up and being put within dynamoDB but when looking at the Connect Wallboard it's not showing any number. It seems to be affecting just this one queue. Any other ideas that could possible be causing this or any way we could possible troubleshoot? We really appreciate your help on this your work is definitely top tier!

Brettles commented 1 year ago

Apologies for the delay - had some time off. If the data is in the database (that's good!) and it's correct (that's excellent!) then we're looking for the problem on the render side. Again: You can turn on the debugging in the render Lambda and see what it is doing - it will print out errors if it is being told to display something that it can't find. But that will be obvious - as the cell will be blank or zero all the time rather than having a random value in it.

Two other random things which come to mind:

  1. There's a typo in the definition such that the cell is supposed to display something that doesn't have a definition. That's unlikely (it should be picked up during the import process) but it could happen.
  2. The cell is pointing to something else which is defined - so it is picking up data; just not what you want it to.

Both of those are easy to fix. But given that I suspect you've double-checked all of that I think that it's more likely that there's something else going on - hence the ask (again - sorry) for the debugging in the render function.