OpenMontana / montana-legislature-council-data-project

Mozilla Public License 2.0
2 stars 1 forks source link

ci(dependabot): bump cdp-backend[pipeline] from 3.2.11 to 4.0.2 in /python #50

Closed dependabot[bot] closed 1 year ago

dependabot[bot] commented 1 year ago

Bumps cdp-backend[pipeline] from 3.2.11 to 4.0.2.

Release notes

Sourced from cdp-backend[pipeline]'s releases.

Google Speech-to-Text Out, Whisper In

CouncilDataProject cdp-backend v4.0.0

:warning: :warning: This is a major breaking release. Instance maintainers should update the instance with just update-from-cookiecutter. :warning: :warning:

You should re-read through the SETUP/README.md document as there is some new minor configuration required. Specifically the new PERSONAL_ACCESS_TOKEN and Quote Increase request should be the only things that need to be updated for existing instances.

You should also lower how often your CRON event gather runs prior to running just update-from-cookiecutter. All of the instances maintained by the CDP Core Team will be lowered to running only once per day.


Council Data Project is a backend, frontend, and cookiecutter deployment for creating a whole database, storage system, and website, for archiving, exploring, and tracking municipal council action.

This library, cdp-backend maintains the pipelines, database models, infrastructure configuration, etc.

v4.0.0

There are two main changes for this release.

  1. We are swapping out Google Speech-to-Text for OpenAIs Whisper.

Specifically, we are using a forked version called faster-whisper. This new speech-to-text model performs much better (ranging from ~3.6% word-error-rate to ~9% word-error-rate on long audio files).

To use this new model efficiently, we need access to a GPU. Since GitHub Actions do not have GPUs available, we are using a system which spins up a Google Cloud Compute Engine instance, connects to it, runs our job, and then tears it down all in the course of a single GitHub Action workflow. From multiple tests, this should be a reduction in cost and processing time however with this release we will do more testing to get a better estimate.

  1. We have switched from MIT to MPLv2 License.

Unless you are trying to fork our code and take it private, this won't affect you.

Commits
  • d98dbc4 Add support for py311
  • eed93c4 Force spacy not to split hypenated words
  • 55d51fd Final v4 changes hopefully
  • e9517b4 Remove the capitalize function in favor of str manip
  • 8b49d32 Lint and format
  • 7f461e0 Seemingly working, capitalize sentences
  • 2063b88 Still working on debugging whisper sentencing
  • 0f849fa Switch to MPLv2
  • 43e3119 Forgot to compare spacy output with the sent text
  • 8978e77 Add retry to generate transcript due to flaky model loading
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
dependabot[bot] commented 1 year ago

Superseded by #52.