Closed evamaxfield closed 2 years ago
Merging #165 (b5babdc) into main (ede007f) will decrease coverage by
0.07%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## main #165 +/- ##
==========================================
- Coverage 94.56% 94.49% -0.08%
==========================================
Files 50 50
Lines 2558 2560 +2
==========================================
Hits 2419 2419
- Misses 139 141 +2
Impacted Files | Coverage Δ | |
---|---|---|
cdp_backend/pipeline/event_index_pipeline.py | 85.71% <ø> (ø) |
|
cdp_backend/tests/utils/test_string_utils.py | 100.00% <100.00%> (ø) |
|
cdp_backend/utils/string_utils.py | 81.39% <100.00%> (-3.98%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update ede007f...b5babdc. Read the comment docs.
Looks good to me! Honestly pretty suprised that Google speech-to-text generated an emoji haha
I think this transcript is from a converted closed caption. Which makes a bit more sense 😂
Link to Relevant Issue
This pull request resolves #151
Description of Changes
Include a description of the proposed changes.
Finally found time to fix this one. Can't believe the bug either.
Looks like a transcript from Seattle has a
♪
pictogram / emoticon in it... link -- start at 30:09 -- or search for♪
This fixes the pipeline by adding a function to clean all common pictograms / emojis from the sentence before stemming and fuzzy matching for context spans.
Tested by running the pipeline and storing the index locally:
run_cdp_event_index -n 1 --store_local --parallel ../configs-and-special-events/seattle.json