aws-samples / bedrock-claude-chat

AWS-native chatbot using Bedrock + Claude (+Mistral)
MIT No Attribution
693 stars 237 forks source link

[BUG] invalid byte sequence for encoding "UTF8" #308

Open sbakir opened 1 month ago

sbakir commented 1 month ago

Description

After uploading over 2 GB, and 400 pdf files, embedding start to fail as below: "[ERROR] Failed to embed. {'S': 'ERROR', 'V': 'ERROR', 'C': '22021', 'M': 'invalid byte sequence for encoding "UTF8": 0x00', 'W': 'unnamed portal parameter $3', 'F': 'mbutils.c', 'L': '1679', 'R': 'report_invalid_encoding'}"

To Reproduce

Create a chatbot with large knowledge base over 2 GB, and over 400 different pdf, word, presentation files and start embedding. After a while, it gives the "Failed to embed" error.

statefb commented 1 month ago

Possible cause and solution:
https://www.cybertec-postgresql.com/en/fix-bad-encoding-postgresql/