Automatic transcription for audio

WPFilmmaker commented 3 years ago

Is your feature request related to a problem? Please describe. If I have a long audio/video I need to transcribe manually, use third party software or pay for monthly subscriptions which not everyone may afford (and they often require internet access).

Describe the solution you'd like It would be great if QualCoder could automatically transcribe text from audio/video. Of course transcription is never 100% accurate but just checking the final result vs doing everything by hand makes a big difference. Googling I also found out that speech to text is a complex thing so for Colin alone this would be a big work.

BUT I found a python 3 api (opensource) which make could be integrated into QualCoder and make transcription as simple as pressing a few buttons: https://alphacephei.com/vosk/

Here is a demo from terminal: https://www.youtube.com/watch?v=Itic1lFc4Gg&feature=emb_title&disable_polymer=true

Below are the main features (from their website):

Vosk is a speech recognition toolkit. The best things in Vosk are:

Supports 16 languages and dialects - English, Indian English, German, French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Catalan, Arabic, Greek, Farsi. More to come.
Works offline, even on lightweight devices - Raspberry Pi, Android, iOS
Installs with simple pip3 install vosk
Portable per-language models are only 50Mb each, but there are much bigger server models available.
Provides streaming API for the best user experience (unlike popular speech-recognition python packages)
There are bindings for different programming languages, too - java/csharp/javascript etc.
Allows quick reconfiguration of vocabulary for best accuracy.
Supports speaker identification beside simple speech recognition.

Vosk seems to be working without internet, which is a great plus, the only downside I can see is that the models are 50mb each so adding a few language could balloon the size of QualCoder, so English could be added by default and optional languages could be added by the user, or no language could be added by default leaving the user freedom to add what they need.

I think the main point is to make things as user friendly as possible. QualCoder compared to other software is very intuitive, and I think it should remain so.

ccbogel commented 3 years ago

There are some additional limitations - in that the A/V format has to be made to match the requirement for vosk: Vosk site: 'When using your own audio file make sure it has the correct format - PCM 16khz 16bit mono' So this may require users to manipulate their files using ffmpeg as well.

There are many online audio to text sites such as otter.ai and Google Chrome audio to text. So while the idea is good, I am not keen on adding this functionality when there are others already in existence to do this well.

WPFilmmaker commented 3 years ago

I wasnt aware of such limitations. If the user has to manipulate files with ffmpeg then I think it is a big no. Yes I heard of otter.ai and their 600 free minutes, I just imagined something offline and more privacy friendly, that's all ;)

But with such limitations I am closing, if these limitations will be remove I might reopen this in the future if Colin will decide it worth.

glocalglocal commented 3 years ago

I am not saying qualcoder should do speech to text, but cloud based tools become increasingly problematic. Uploading an interview to a website (esp one based in the US) won't always get research ethics approval, may be a legal minefield in certain territories (eg under GDPR in the EU), and is frowned upon by funders -- and rightly so.

ccbogel commented 3 years ago

Yes sure. Perhaps I can add to the manual suggestions for people to make use of otter.ai and vosk. But they have you work those tools out themselves.

WPFilmmaker commented 3 years ago

@glocalglocal I absolutely agree with what you said.

But I also understand Colin. I think adding it to the manual could be a good compromise, if vosk removes the format limitation and there are ways where QualCoder could with a few click work on audio to text I would be glad. But it is up to Colin decide whether this feature is necessary or worth his time. And thank you again Colin for your great work :)

glocalglocal commented 3 years ago

But I also understand Colin.

As I said earlier, 'I am not saying qualcoder should do speech to text'. Personally, I am not all that keen on speech-to-text because I am yet to find a reliable tool that doesn't require spending too much time checking and correcting. In the end, I decided transcribing interviews manually myself saves me time. If Qualcoder can simply interface with other tools with little overheads and leave the users to choose for themselves, I suppose it's the best of both worlds.

ccbogel commented 3 years ago

@glocalglocal Yes I ma not keen for a little project such as this to then become a behemoth with linking to so many other services and the problems that that will entail. Thats also why I am keeping away somewhat from the 'R interoperability' issue raised by someone. As that also is an ambiguous request. I need clear enactable requests to work on.

WPFilmmaker commented 3 years ago

@ccbogel

Colin this is the reply I got from vosk:

As a library vosk doesn't convert formats to reduce dependencies and software complexity.

If your project requires different formats you can simply integrate with existing library like ffmpeg, libav, gstreamer to convert data before processing like demonstrated here here:

https://github.com/alphacep/vosk-api/blob/master/python/example/test_ffmpeg.py

Is it something you think could be easily done or you don't think it is worth/have other priorities?

ccbogel commented 3 years ago

Hi Marcus,

Its a nice idea. But I will have the same problem as vosk: "to reduce dependencies and software complexity."

Yes, I also have other priorities, like getting REFI-QDA working and links to files etc. I know people such as yourself and others want to add extra functions and capabilities, but I prefer to keep it fairly simple and for now refine it so that it works well. And then gradually add things. But the things to add have to be really beneficial. It is not worth me spending hours of free time on functions that may only have little benefit or benefit very few people. For example - someone wants whole paragraphs automatically coded now - thats not going to benefit many people and it will be a pain to try and implement.

regards Colin

From: WPFilmmaker notifications@github.com Sent: Friday, 13 November 2020 7:40 AM To: ccbogel/QualCoder QualCoder@noreply.github.com Cc: Colin Curtain ccbogel@hotmail.com; Mention mention@noreply.github.com Subject: Re: [ccbogel/QualCoder] Automatic transcription for audio (#284)

@ccbogelhttps://github.com/ccbogel

Colin this is the reply I got from vosk:

As a library vosk doesn't convert formats to reduce dependencies and software complexity.

If your project requires different formats you can simply integrate with existing library like ffmpeg, libav, gstreamer to convert data before processing like demonstrated here here:

https://github.com/alphacep/vosk-api/blob/master/python/example/test_ffmpeg.py

Is it something you think could be easily done or you don't think it is worth/have other priorities?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ccbogel/QualCoder/issues/284#issuecomment-726329602, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABMQPDT57SPNIAI7HTGSUJLSPRB3FANCNFSM4THRDFDA.

WPFilmmaker commented 3 years ago

I had your exact same :D both projects wants to reduce dependency and complexity :D

Unfortunately as a simple user I can only provide feedbacks (feature requests, bugs and user experiences), such as when I mentioned an easy way to install QualCoder on windows. Definitively having prioprities makes sense and it is normal, if I had to choose between speech to text and easy install on windows I would implement the latter as more people are likely to encounter this issues.

Also as project developed during spare time it makes sense prioritizing features that will be used by many people :)

Once again thanks for your hard work, I always recommend QualCoder to people :+1:

ps: In the weekend hopefully I will be able to check the new files for qtlinguistic and let you know if they work, so that during next week I can send you the Italian translation. I totally understand

ccbogel commented 3 years ago

Haha, yes. But thank you for your feedback - I might not use all of it, but I do need the feedback. As it helps to work out the important features and to fix errors and making it easier to use.

From: WPFilmmaker notifications@github.com Sent: Friday, 13 November 2020 8:45 AM To: ccbogel/QualCoder QualCoder@noreply.github.com Cc: Colin Curtain ccbogel@hotmail.com; Mention mention@noreply.github.com Subject: Re: [ccbogel/QualCoder] Automatic transcription for audio (#284)

I had your exact same :D both projects wants to reduce dependency and complexity :D

Unfortunately as a simple user I can only provide feedbacks (feature requests, bugs and user experiences), such as when I mentioned an easy way to install QualCoder on windows. Definitively having prioprities makes sense and it is normal, if I had to choose between speech to text and easy install on windows I would implement the latter as more people are likely to encounter this issues.

Also as project developed during spare time it makes sense prioritizing features that will be used by many people :)

Once again thanks for your hard work, I always recommend QualCoder to people 👍

ps: In the weekend hopefully I will be able to check the new files for qtlinguistic and let you know if they work, so that during next week I can send you the Italian translation. I totally understand

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ccbogel/QualCoder/issues/284#issuecomment-726359139, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABMQPDW227PZTQMJEFQXPW3SPRJORANCNFSM4THRDFDA.

ccbogel commented 3 years ago

and thank you for translating 🙂

From: WPFilmmaker notifications@github.com Sent: Friday, 13 November 2020 8:45 AM To: ccbogel/QualCoder QualCoder@noreply.github.com Cc: Colin Curtain ccbogel@hotmail.com; Mention mention@noreply.github.com Subject: Re: [ccbogel/QualCoder] Automatic transcription for audio (#284)

I had your exact same :D both projects wants to reduce dependency and complexity :D

Unfortunately as a simple user I can only provide feedbacks (feature requests, bugs and user experiences), such as when I mentioned an easy way to install QualCoder on windows. Definitively having prioprities makes sense and it is normal, if I had to choose between speech to text and easy install on windows I would implement the latter as more people are likely to encounter this issues.

Also as project developed during spare time it makes sense prioritizing features that will be used by many people :)

Once again thanks for your hard work, I always recommend QualCoder to people 👍

ps: In the weekend hopefully I will be able to check the new files for qtlinguistic and let you know if they work, so that during next week I can send you the Italian translation. I totally understand

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ccbogel/QualCoder/issues/284#issuecomment-726359139, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABMQPDW227PZTQMJEFQXPW3SPRJORANCNFSM4THRDFDA.

ccbogel / QualCoder

Automatic transcription for audio #284