CouncilDataProject / cdp-frontend

Component library and web app used by CDP instances.
https://councildataproject.org/cdp-frontend
Mozilla Public License 2.0
17 stars 27 forks source link

Add disclaimer about ML generated transcription to event search results page #141

Closed evamaxfield closed 2 years ago

evamaxfield commented 2 years ago

We have known there will always be some transcription errors and until we invest in making our own model, we should at the very least put a disclaimer about them:

To provide event transcripts at low-cost, Council Data Project uses Google Speech-to-Text. The transcriptions may include errors and absurdities. Please understand that our team regrets any miscommunication caused by these errors.

evamaxfield commented 2 years ago

Further thinking about this, maybe we should only show this on the event page if the transcript was generated by a known ML method. We store the generator in the transcript file: https://github.com/CouncilDataProject/cdp-backend/blob/main/cdp_backend/pipeline/transcript_model.py#L174

If the generator matches a regex like: https://github.com/CouncilDataProject/cdp-backend/blob/main/cdp_backend/sr_models/google_cloud_sr_model.py#L248

Then display this disclaimer?

smai-f commented 2 years ago

Hey @JacksonMaxfield , is anyone working on this, and if not, can I pick it up? Mark is in contact with our city about our CDP instance and I think it'd be good to have this disclaimer since all of our transcripts are ML generated.

evamaxfield commented 2 years ago

To my knowledge no one is working on this! Feel free to take it on.

Totally agree that it's a good thing to have haha. Where to put it on the single event page / maybe search page too is the question in my mind.

smai-f commented 2 years ago

@JacksonMaxfield Cool, I'll try a few placements out and see what y'all think!

evamaxfield commented 2 years ago

Wooo! Thanks!

Shak2000 commented 2 years ago

I made a prototype of a few options, and I want to get early feedback:

  1. In the transcript: https://drive.google.com/file/d/1S9pE5ItALlGTng8BQiE0CFX2lmsy4vYl/view?usp=sharing
  2. In the search page: https://drive.google.com/file/d/1kQCa10g3FSBmdl5ogtniRU71uy-yopDf/view?usp=sharing

I can add a box around the text, color the text, or change the font size. I can also put the text below the meetings.

For now, the text is displayed for all results. Once we decide how to display the text, I will how to filter it only for ML classifications. I think that classification can only work for the first option since we have too many events in the second option—some of them can be ML-generated, others can be human-generated.

I am open to receiving any advice.

evamaxfield commented 2 years ago

Hey @Shak2000 these look good, I have one suggestion but before I get to it, I am curious if you can try deploying: https://github.com/CouncilDataProject/cdp-frontend/blob/main/CONTRIBUTING.md#deploying-your-storybook-docs-site-or-example-app

Specifically:

npm run build:app
npm run deploy:app

Then we can simply go to your page and check it out.

Shak2000 commented 2 years ago

Here it is: https://shak2000.github.io/cdp-frontend/#/

evamaxfield commented 2 years ago

Thanks! I think I like the event search results location more than the transcript search results location.

Feel free to remove the transcript search results one.

I am tempted to ask to also place this in the footer so it's always present.

evamaxfield commented 2 years ago

I may have some general rewording later

Shak2000 commented 2 years ago

I removed the transcript search location. I added the footer. I put the footer where the copyrights. I can move it into the links in the area above

evamaxfield commented 2 years ago

I removed the transcript search location. I added the footer. I put the footer where the copyrights. I can move it into the links in the area above

Wanna open a PR?

Four things:

  1. for the text on the event search results page: can you make the font size smaller? It is currently rendered at 1.125 rem it looks like. I think 1 rem looks better personally
  2. for the text on the event search results page: can you change it to: "To provide event transcripts at low-cost, Council Data Project uses Google Speech-to-Text for transcription. Event transcripts may include errors."
  3. for the text on the event search results page: can you move the disclaimer to below the body + date filters and the sort options?
  4. for the text in the footer: can you change it to: "In many cases, Council Data Project utilizes a fine-tuned Google Speech-to-Text model for generation of event transcripts. We understand that transcripts may include errors. If you are a machine learning expert and wish to help improve our system for generating transcripts, please reach out to us on GitHub."
Shak2000 commented 2 years ago

I implemented all of the requests

Shak2000 commented 2 years ago

Can you please assign it to me? I have a PR waiting