Closed converseKarl closed 1 month ago
This is a big gap in the functionality - it is quite a simple addition. Anthropic supports it already so at least Claude models should be able to support pptx very easily Thanks!
This is a big gap in the functionality - it is quite a simple addition. Anthropic supports it already so at least Claude models should be able to support pptx very easily Thanks!
Indeed, but text spliting happens before any LLM query, but any model including titan should be able to use it easily, same as PDF's.
Thanks for the feedback! AWS SDKs uses the exported models from the service API, but AWS service API doesn't support ppt&pptx yet - https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ContentBlock.html https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_DocumentBlock.html
I think this is a good addition to the bedrock service API. I will move this issue to a cross SDK issue and open a feature request to the service team!
Thanks! Maggie
Added Product Feature Request, title - "Add ppt, pptx text splitting support in Bedrock Rag knowledge base query"
Thanks again for the feature request.The Bedrock team is continuing to track this in their backlog for consideration. We're going to close this on our end as the service team would need to take the next steps here. Please refer to the blog or CHANGELOG for updates, or feel free to reach out through support if you have a support plan.
Thanks! Maggie
This issue is now closed.
Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.
Describe the feature
Current support but is behind other systems at moment,
format: "pdf" || "csv" || "doc" || "docx" || "xls" || "xlsx" || "html" || "txt" || "md", // required
but in rag you need ppt, pptx, (powerpoint splitting)
to 100% complete you need mp4, mp3, youtube URL, youtube channel and JSON (someone implied its in there but i've not seen it)
Use Case
You have everything else except powerpoint, you have word, excel, txt, csv, html but no powerpoint.
A lot of information is in powerpoints, company info, results, and numerous presentations for training so ragifying them and using the information is quite a substantial set of user cases.
Proposed Solution
implement the embedding extraction from powerpoints (like you do with PDF's). If your using langchain in the background, its 5 minute job to add the PPT/PPTX conversion as a loader type but I don't know your underlying implementation.
Other Information
No response
Acknowledgements
SDK version used
3.651.1
Environment details (OS name and version, etc.)
Linux Debian, Nodejs / EC2 or even using Lamba AWS direct