This PR adds support for parsing and uploading .docx files in the chatpdf application. Currently, the application only supports parsing and uploading of PDF files. With this PR, users will be able to upload .docx files and have their content parsed and displayed in the chat interface.
Summary of Changes
Added a new function parse_docx in service/utils/parse_docx.py to parse .docx files using the python-docx library.
Modified the file upload handler in service/app.py to accept .docx files. The /pdf route is now renamed to /file to handle both .pdf and .docx files. The uploaded file extension is checked, and the appropriate parsing function (parse_pdf or parse_docx) is used.
Modified the frontend code in src/App.js to allow uploading of .docx files. The accept attribute of the file input element is updated to accept both .pdf and .docx files. The file upload request is modified to include .docx files.
Please review and merge this PR to enable support for .docx files in the chatpdf application.
Fixes #1.
To checkout this PR branch, run the following command in your terminal:
git checkout sweep/support-docx
🎉 Latest improvements to Sweep:
Getting Sweep to run linters before committing! Check out Sweep Sandbox Configs to set it up.
Added support for self-hosting! Check out Self-hosting Sweep to get started.
[Self Hosting] Multiple options to compute vector embeddings, configure your .env file using VECTOR_EMBEDDING_SOURCE
💡 To get Sweep to edit this pull request, you can:
Leave a comment below to get Sweep to edit the entire PR
Leave a comment in the code will only modify the file
Edit the original issue to get Sweep to recreate the PR from scratch
Description
This PR adds support for parsing and uploading .docx files in the chatpdf application. Currently, the application only supports parsing and uploading of PDF files. With this PR, users will be able to upload .docx files and have their content parsed and displayed in the chat interface.
Summary of Changes
parse_docx
inservice/utils/parse_docx.py
to parse .docx files using the python-docx library.service/app.py
to accept .docx files. The/pdf
route is now renamed to/file
to handle both .pdf and .docx files. The uploaded file extension is checked, and the appropriate parsing function (parse_pdf
orparse_docx
) is used.src/App.js
to allow uploading of .docx files. Theaccept
attribute of the file input element is updated to accept both .pdf and .docx files. The file upload request is modified to include .docx files.Please review and merge this PR to enable support for .docx files in the chatpdf application.
Fixes #1.
To checkout this PR branch, run the following command in your terminal:
🎉 Latest improvements to Sweep:
💡 To get Sweep to edit this pull request, you can: