aws-samples / bedrock-claude-chat

AWS-native chatbot using Bedrock + Claude (+Mistral)
MIT No Attribution
827 stars 303 forks source link

[BUG] Lambda based Youtube video transcript download seems to be blocked #491

Open typex1 opened 1 month ago

typex1 commented 1 month ago

Describe the bug

Filling out this field will help us investigate the issue efficiently. Providing detailed information allows us to set the appropriate priority. We appreciate your cooperation.
A clear and concise description of what the bug is.

Creating a new Bot including a knowledge base from Youtube transcriptions fails. Error in Frontend: "Failed to detect language: Could not retrieve a transcript for the video https://www.youtube.com/watch?v=Pv0cfsastFs"

This error message is misleading, because what seems to go wrong is not language detection specifically, but the whole transcript API seems to be not usable.

It took me quite some research to have good evidence that AWS owned IP addresses are (currently) blocked from Youtube transcription download. This applies at least to Lambda functions and Cloud9. Tested in us-east-1, eu-central-1 and ap-northeast-1.

To Reproduce

Filling out this field will help us investigate the issue efficiently. Providing detailed information allows us to set the appropriate priority. We appreciate your cooperation.
Steps to reproduce the behavior:

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

Screenshot 2024-08-13 at 14 26 53
typex1 commented 1 month ago

Error situation on the Bot overview list:

Screenshot 2024-08-13 at 14 47 41
statefb commented 1 month ago

The problem is caused by youtube-transcript-api library. Issues:

We may remove the youtube feature in the future for KnowledgeBase integration. Thank you for your understanding.