AWS QnABot is a multi-channel, multi-language conversational interface (chatbot) that responds to your customer's questions, answers, and feedback. The solution allows you to deploy a fully functional chatbot across multiple channels including chat, voice, SMS and Amazon Alexa.
Describe the bug
In v6.1.0, if an import file is larger than 20000 bytes and hence requires multiple reads from S3, json objects at the beginning of each chunk after the first chunk are truncated and so fail to parse, which causes an unknown number of qid to be dropped when importing a qna export. Also, the new code will read 15 chunks of import data from S3. If each chunk is only ~20k, this is only 300K, which is much too small. These values should be configurable, or the threshold needs to be much higher.
Replacing this code with the v6.0.1 version fixes the issue. This is a regression over previous behavior.
To Reproduce
Create or procure a large export file. This was recreated with a 1MB file containing 195 qid.
Import the file.
Compare the number of successfully imported QID to the actual number in the file. In this case, 54 of the 195 qid were successfully imported.
Expected behavior
All qid should be imported.
Please complete the following information about the solution:
[X ] Version: (SO0189) QnABot with admin and client websites - Version v6.1.0
[ X] Region: us-west-2
[ X] Was the solution modified from the version published on this repository? No
[ X] Have you checked your service quotas for the services this solution uses? - This issue is not caused by service quotas.
[ X] Were there any errors in the CloudWatch Logs? Yes, there are several errors that look like the following:
2024-09-20T22:42:41.262Z 80513f94-33be-4d7a-ab5b-b879ddec27cf INFO Failed to Parse: Unexpected token u in JSON at position 0 undefined <partial json text from the import file>
Stack trace looks like
2024-09-20T22:42:41.262Z 80513f94-33be-4d7a-ab5b-b879ddec27cf INFO SyntaxError: Unexpected token u in JSON at position 0
at JSON.parse (<anonymous>)
at processQuestionObjects (/var/task/index.js:242:28)
at exports.step [as handler] (/var/task/index.js:105:60)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Additional context
There is a log line like
2024-09-20T22:42:41.438Z 80513f94-33be-4d7a-ab5b-b879ddec27cf INFO ContentRange: bytes 20001-40001/840237
indicating that this is a multi-read file. Also, the first parse of a new chunk always fails. Also at 840237 bytes, less than half this file is actually processed.
Couple other points:
Would be great to take out all the Embeddings disabled - EMBEDDINGS_ENABLE: false log lines. Perhaps they should be in debug only? Coming from embeddings.js.
Describe the bug In v6.1.0, if an import file is larger than 20000 bytes and hence requires multiple reads from S3, json objects at the beginning of each chunk after the first chunk are truncated and so fail to parse, which causes an unknown number of qid to be dropped when importing a qna export. Also, the new code will read 15 chunks of import data from S3. If each chunk is only ~20k, this is only 300K, which is much too small. These values should be configurable, or the threshold needs to be much higher.
Replacing this code with the v6.0.1 version fixes the issue. This is a regression over previous behavior.
To Reproduce
Expected behavior All qid should be imported.
Please complete the following information about the solution:
Stack trace looks like
Additional context There is a log line like
indicating that this is a multi-read file. Also, the first parse of a new chunk always fails. Also at 840237 bytes, less than half this file is actually processed.
Couple other points:
Embeddings disabled - EMBEDDINGS_ENABLE: false
log lines. Perhaps they should be in debug only? Coming from embeddings.js.