aws-samples / aws-kendra-transcribe-media-search

MIT No Attribution
33 stars 17 forks source link

Does not work #29

Closed aejuice-github closed 9 months ago

aejuice-github commented 1 year ago

I was following video example precisely. Even with your default values it does not work.

  1. I replaced a playlist with my playlist, the link is correct and has the same format. There is an error while building the index. It cannot crawl youtube video. The playlist is public.
  2. I used your playlist and once it deployed the web app link does not work. It says "Your app will appear here once you complete your first deployment.". Even though all steps were successful and it says it is deployed.

Great concept, but poor implementation. Is there a way to fix it?

rstrahan commented 1 year ago

Hi, So sorry you're having problems. Let's see if we can get them resolved.

On (1) the problem seems somehow related to the playlist url, since you mentioned in (2) that the demo playlist got past the index creation step.

On (2) this message just means to wait a bit.. did you seein the blog post..

If the application isn’t ready when you first open the page, don’t worry! The initial application build and deployment (using AWS Amplify) takes about 10 minutes, so it will work when you try again a little later. If for any reason the application still doesn’t open, refer to the README in the GitHub repo for troubleshooting steps.

Can you check again and see if it works? If not please post exact messages/ screenshots.

aejuice-github commented 1 year ago

@rstrahan 1. Incorrect. Your playlist works because results are cached. According to your documentation, you do not parse videos twice. It does not work with any other playlist, I've tried plenty. In fact, we've debugged why it does not work. Your app uses a package pytube which is outdated. YouTube has changed the link structure and it cannot get a video. You can find more info and error message at https://stackoverflow.com/questions/68945080/pytube-exceptions-regexmatcherror-get-throttling-function-name-could-not-find

Here is a link to the playlist https://www.youtube.com/playlist?list=PLr7J3R1sT1C5pcB_xuhH1cTV9W19z1iDk

I hope you'll be able to resolve it. We would be very interested in using the service.

  1. You're correct. The issue is resolved.
rstrahan commented 1 year ago

@aejuice-github OK, we'll look into (1) and post back.. Thanks for letting us know.

rstrahan commented 1 year ago

Confirming that I can easily repro the problem..

image

Referring to colleague who implemented this feature. Tx.

roshansthomas commented 1 year ago

Regarding issue (1). Post investigation The solution "does not" cache a playlist. When the playlist is changed the solution will index the videos per the new playlist (if they have not been indexed prior). The issue that is currently causing the stack to fail is with pytube version 15.0.0. Issue - > (https://github.com/pytube/pytube/issues/1707). We are working to fix this in the interim while a permanent fix is made to the pytube main branch.

Also if you do not want to index the YT media, you could leave the playlist empty and only mention the S3 bucket source where your media is stored then the stack deploys successfully and indexes the media from s3.

roshansthomas commented 1 year ago

Tested v0.3.1 which contains the pytube 15.0.0. fix. I am now able to index youtube videos on the YT playlist provided as default value of the CFN template parameter. Also able to change the playlist to the playlist quoted in the issue above https://www.youtube.com/playlist?list=PLr7J3R1sT1C5pcB_xuhH1cTV9W19z1iDk. And the indexer picks up the new videos as well. This issue is fixed and can be closed.

rstrahan commented 1 year ago

Release v0.3.1 based on your PR @roshansthomas addresses this issue. Updated artifacts published to the public S3 bucket. (The fix is temporary, and applies only to the current version of pytube, 5.0.0. We expect that the next release of pytube will address the issue officially) @aejuice-github please deploy again and report back if you encounter any remaining issues. Thanks again for letting us know about the problem.

roshansthomas commented 9 months ago

Closing issue as the the solution now uses yt_dlp package