aws-solutions / qnabot-on-aws

AWS QnABot is a multi-channel, multi-language conversational interface (chatbot) that responds to your customer's questions, answers, and feedback. The solution allows you to deploy a fully functional chatbot across multiple channels including chat, voice, SMS and Amazon Alexa.
https://aws.amazon.com/solutions/implementations/aws-qnabot
Apache License 2.0
401 stars 253 forks source link

Import function fails to handle partial json objects and large files in v6.1.0 #766

Closed t-jones closed 2 months ago

t-jones commented 2 months ago

Describe the bug In v6.1.0, if an import file is larger than 20000 bytes and hence requires multiple reads from S3, json objects at the beginning of each chunk after the first chunk are truncated and so fail to parse, which causes an unknown number of qid to be dropped when importing a qna export. Also, the new code will read 15 chunks of import data from S3. If each chunk is only ~20k, this is only 300K, which is much too small. These values should be configurable, or the threshold needs to be much higher.

Replacing this code with the v6.0.1 version fixes the issue. This is a regression over previous behavior.

To Reproduce

  1. Create or procure a large export file. This was recreated with a 1MB file containing 195 qid.
  2. Import the file.
  3. Compare the number of successfully imported QID to the actual number in the file. In this case, 54 of the 195 qid were successfully imported.

Expected behavior All qid should be imported.

Please complete the following information about the solution:

Additional context There is a log line like

2024-09-20T22:42:41.438Z    80513f94-33be-4d7a-ab5b-b879ddec27cf    INFO    ContentRange:  bytes 20001-40001/840237

indicating that this is a multi-read file. Also, the first parse of a new chunk always fails. Also at 840237 bytes, less than half this file is actually processed.

Couple other points:

fhoueto-amz commented 2 months ago

Thanks, we will look into this and come back

tmekari commented 2 months ago

This has been addressed in our v6.1.1 release. Thanks for bringing this to our attention, please feel free to reach out if you have any other issues!