CouncilDataProject / cdp-backend

Data storage utilities and processing pipelines used by CDP instances.
https://councildataproject.org/cdp-backend
Mozilla Public License 2.0
22 stars 26 forks source link

feature/filter-out-invalid-captions #214

Closed dphoria closed 1 year ago

dphoria commented 1 year ago

Link to Relevant Issue

This pull request resolves #213

Description of Changes

In create_event_gather_flow(), in addition to calling resource_exists() for session.caption_uri, further validate the caption file by comparing its length against the video at session.video_uri. Reject, i.e. just do speech-to-text, if their lengths differ by more than 20%.