Closed sagarrat7 closed 1 year ago
Merging #231 (757e3f8) into main (703d7f1) will increase coverage by
0.65%
. The diff coverage is97.75%
.
@@ Coverage Diff @@
## main #231 +/- ##
==========================================
+ Coverage 72.12% 72.78% +0.65%
==========================================
Files 50 50
Lines 3376 3465 +89
==========================================
+ Hits 2435 2522 +87
- Misses 941 943 +2
Impacted Files | Coverage Δ | |
---|---|---|
cdp_backend/tests/utils/test_file_utils.py | 98.88% <92.85%> (-1.12%) |
:arrow_down: |
cdp_backend/tests/conftest.py | 100.00% <100.00%> (ø) |
|
cdp_backend/utils/file_utils.py | 92.36% <100.00%> (+1.88%) |
:arrow_up: |
Link to Relevant Issue
Related to #81
Description of Changes
Adds utility function
parse_document()
that extracts text from docx, doc, pdf, and pptx matters to be used in indexing matters in addition to transcripts. Note: pptx files contain extra "Title" and "/docProps/thumbnail.jpeg". This can be removed if needed.