ankidroid / Anki-Android

AnkiDroid: Anki flashcards on Android. Your secret trick to achieve superhuman information retention.
GNU General Public License v3.0
8.31k stars 2.18k forks source link

Voice control #1717

Closed hssm closed 2 years ago

hssm commented 9 years ago

Originally reported on Google Code with ID 815

Hello,

I’m not planning to do something like this, but wouldn’t this be nice?

If I am driving a car and pair my phone with the hands-free equipment, it would be
nice if there were a voice-controlled Ankidroid. For the cards, it is no problem (just
make voice files for every vocabulary), but the selection would be difficult: One cannot
press “easy” without touching the screen.

So it would be nice if one could speak “easy“ “again” “difficult” etc., and Ankidroid
would understand it and press the appropriate button. This would also be nice for other
situations, where one cannot use the hands (while making food for example, or when
lying in the warm bed and not wanting to put the hands outside etc).

Reported by gerritsangel on 2011-10-09 20:53:38

hssm commented 9 years ago
Voice control would be nice indeed!
Thanks for the idea :-)

Note: I would never review flashcards while driving...
Driving requires full attention, and reviewing flashcards is quite an intensive activity.

Reported by nicolas.raoul on 2011-10-09 23:16:24

hssm commented 9 years ago
Well, I don’t know how it is with driving - I also guess it would be quite dangerous.
But I guess if it is just repetition of cards you know pretty well, it shouldn’t be
that bad - usually one also talks when driving etc.

But other activities where you cannot use your hands (or where it is a nuisance) are
quite suited for that - cooking when you have your hands dirty, or jogging, bicycle,
etc.

I guess the main problem would then be if you have to create voice files for every
card, or if one could use a screen reader (isn’t there something built into Android?)...

Reported by gerritsangel on 2011-10-11 19:12:51

hssm commented 9 years ago
Indeed that would be useful when cooking or strength training!

Text-to-speech is available, and pure-audio decks are also a possibility.

The only difficult part would be voice recognition for "easy" "difficult" etc, I guess
that would require the user to say those words a few times.

Everyone:
Is there any speech recognition library available for Android? (preferably open source)

Reported by nicolas.raoul on 2011-10-12 01:43:36

hssm commented 9 years ago
An open source speech recognition library that works on Android: http://cmusphinx.sourceforge.net/
See http://stackoverflow.com/questions/4396046

Reported by nicolas.raoul on 2011-10-30 09:44:47

hssm commented 9 years ago
Just wondering - are there keyboard shortcuts for the Easy/Difficult - (e.g "E" and
"D") - doing that might make the recognition much easier (it would therefore become
a very small grammar indeed,

Reported by stivbennett on 2013-03-26 08:59:05

hssm commented 9 years ago
This is something that I would be interested in using, not while driving but while exercising
or doing other things with my hands.

Couldn't this just be done with the built-in android speech recognition api? It would
be useful even if it took a little while for the network traffic.

Reported by wrsaunde on 2013-08-31 02:13:38

hssm commented 9 years ago
I would find this very useful. I understand that there is quite a well developed Voice
Command system on Android for controlling quite a few applications and writing messages.
There are also other apps that act as personal assistants. Would it be possible for
these to be connected in to Ankidroid?

Reported by gosnell@ctgos.com on 2014-02-19 19:07:26

hssm commented 9 years ago
I'd love to have some kind of voice control in AnkiDroid. Doing reviews when you simply
cannot hold your phone in your hands would be extremely helpful.
Lying in a bathtub, walking/running, exercising etc...
Even simple control with regular/bluetooth headphones would help so much. At least
"fail" and "good" buttons as volume up/down or skip forward/backward.
Please have a look into this :)

Reported by glwisnia on 2014-11-15 12:01:10

hssm commented 9 years ago
+1. So many situations where voice control would be useful.

Reported by antony.gelberg on 2014-11-29 09:55:44

hssm commented 9 years ago
Looks like this thread is quite old ... did anyone find a way to use voice just for
learning cards all the would be needed seems to be some keywords for easy, good ...
and maybe mark, discard .. 

Reported by ralf@dierenbach.ch on 2015-01-21 22:05:46

hssm commented 9 years ago
Voice control would absolutely rock!

Reported by dmt.lsv on 2015-01-22 10:37:43

hssm commented 9 years ago
Apparently there is a handheld version of CMU Sphinx called PocketSphinx, usable on
Android.
http://www.speech.cs.cmu.edu/pocketsphinx/
https://github.com/cmusphinx/pocketsphinx-android
https://github.com/cmusphinx/pocketsphinx-android-demo
https://softwarerecs.stackexchange.com/questions/13797/fast-voice-command-library-on-android-open-source-works-offline

Reported by nicolas.raoul on 2015-04-10 07:23:05

hssm commented 9 years ago
I started looking at this a few weeks ago, and just adapted the example app they have
to do continuous recognition of a few keywords - (one, two, three, four for levels
of difficulty, and "next" for flip the card) - but couldn't get accuracy well enough
to be usable for my needs unless I held the microphone of the phone at a specific angle
to my voice.   If anyone knows more about tuning to get better accuracy, it doesn't
seem like it would be that hard to integrate in, I just gave up because I wasn't getting
good enough results

Reported by agjohnst on 2015-04-10 15:06:12

hssm commented 9 years ago
I take that back - the poor recognition accuracy was due to my old phone being broken.
 On a new phone this works great - I have it integrated into ankidroid and AbstractFlashcardViewer
now takes basic voice commands.  Although the code is ugly (commands are hardcoded
and have to match the text in a certain assets file).  I'll submit a pull request once
I've cleaned up the code / repo

Reported by agjohnst on 2015-04-13 22:22:55

hssm commented 9 years ago
Really exciting! I don't find time to use Anki with my current schedule, but I do spend
a lot of time in traffic jams... :)

Reported by antony.gelberg on 2015-04-13 22:41:43

hssm commented 9 years ago
Issue 403 has been merged into this issue.

Reported by perceptualchaos2 on 2015-06-01 03:29:56

hssm commented 9 years ago
Issue 480 has been merged into this issue.

Reported by perceptualchaos2 on 2015-06-01 04:47:21

Antiec commented 8 years ago

Hi!

Did this go anywhere, was it pulled? Is there a way to get the code? I'm very interested on this functionality! But I ain't skilled enough to find the code :( Skilled enough to compile it tho)

baitisj commented 8 years ago

I created a module for Anki proper that implements voice control: https://ankiweb.net/shared/info/1646263898 Hopefully, this might be useful to the mobile team.

aegray commented 8 years ago

Sorry completely forgot about this for a while - I never cleaned it up, and haven't updated for a current git version, but for now here is a patch off commit 43a8ad459f5223b855ea0ce1760c5f6592613c25 - if I get a chance I'll update, but I don't know enough about localization right now in anki to make this flexible. Also, it looks like I had to do the following:

1) copy the pocketsphinx models dir into AnkiDroid/src/main/assets/sync (create if doens't exist), and add AnkiDroid/src/main/assets/sync/assets.lst with contents:

models/dict/cmu07a.dic
models/grammar/digits.gram
models/grammar/menu.gram
models/grammar/menu.gram.back
models/hmm/en-us-semi/README
models/hmm/en-us-semi/feat.params
models/hmm/en-us-semi/mdef
models/hmm/en-us-semi/means
models/hmm/en-us-semi/noisedict
models/hmm/en-us-semi/sendump
models/hmm/en-us-semi/transition_matrices
models/hmm/en-us-semi/variances

2) edit AnkiDroid/src/main/assets/sync/models/grammar/menu.gram to contain:

okay
one
two
three
four
yes
no

3) copy pocketsphinx-android-0.8-nolib.jar into AnkiDroid/libs 4) copy libpocketsphinx_jni.so into the correct arch directory in either AnkiDroid/libs or AnkiDroid/jniLibs (I'm not sure which it was specifically because I copied to both).

5) apply patch:

diff --git a/AnkiDroid/src/main/java/com/ichi2/anki/AbstractFlashcardViewer.java b/AnkiDroid/src/main/java/com/ichi2/anki/AbstractFlashcardViewer.java
index f8704c9..0434341 100644
--- a/AnkiDroid/src/main/java/com/ichi2/anki/AbstractFlashcardViewer.java
+++ b/AnkiDroid/src/main/java/com/ichi2/anki/AbstractFlashcardViewer.java
@@ -19,6 +19,8 @@

 package com.ichi2.anki;

+import static edu.cmu.pocketsphinx.SpeechRecognizerSetup.defaultSetup;
+
 import android.annotation.SuppressLint;
 import android.app.Activity;
 import android.content.BroadcastReceiver;
@@ -106,6 +108,11 @@ import java.util.Set;
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;

+import edu.cmu.pocketsphinx.Assets;
+import edu.cmu.pocketsphinx.Hypothesis;
+import edu.cmu.pocketsphinx.RecognitionListener;
+import edu.cmu.pocketsphinx.SpeechRecognizer;
+
 import timber.log.Timber;

 public abstract class AbstractFlashcardViewer extends NavigationDrawerActivity {
@@ -197,6 +204,8 @@ public abstract class AbstractFlashcardViewer extends NavigationDrawerActivity {
     private boolean mPrefFixArabic;
     // Android WebView
     private boolean mSpeakText;
+       private boolean mUseVoiceCommands;
+       private float mVoiceThresh;
     protected boolean mDisableClipboard = false;
     protected boolean mInvertedColors = false;
     protected boolean mNightMode = false;
@@ -423,6 +432,107 @@ public abstract class AbstractFlashcardViewer extends NavigationDrawerActivity {
         }
     };

+       class MyRecognitionListener implements RecognitionListener 
+       {
+       private SpeechRecognizer recognizer = null;
+       private static final String KWS_SEARCH = "keywords";
+       public static final String KW_FAIL = "no";
+       public static final String KW_OK = "yes";
+       public static final String KW_FAIL2 = "one";
+       public static final String KW_HARD = "two";
+       public static final String KW_MID = "three";
+       public static final String KW_EASY = "four";
+       public static final String KW_NEXT = "okay";
+
+               @Override
+               public void onBeginningOfSpeech() {
+               }
+
+               @Override
+               public void onEndOfSpeech() {
+               }
+
+       
+               public void stop() 
+               {
+                       if (recognizer != null)
+                       {
+                               recognizer.stop();
+                               recognizer = null;
+                       }
+               }
+
+               public void toggle()
+               {
+                       if (recognizer != null)
+                       {
+                               stop();
+                               Themes.showThemedToast(AbstractFlashcardViewer.this, "Disabled voice recognizer", true);
+                       }
+                       else
+                       {
+                               init();
+                       }
+               }
+
+               public void init()
+               {
+                       try {
+                               Assets assets = new Assets(AbstractFlashcardViewer.this);
+                               File assetDir = assets.syncAssets();
+                               setupRecognizer(assetDir);
+                               initSearch();
+                               Themes.showThemedToast(AbstractFlashcardViewer.this, "Started voice recognizer: " + Double.toString(mVoiceThresh), true);
+                       } catch (IOException e) {
+                       Themes.showThemedToast(AbstractFlashcardViewer.this, "Failed to init voice recognizer", true);
+                       }
+               }
+               private void setupRecognizer(File assetsDir) {
+                       File modelsDir = new File(assetsDir, "models");
+                       recognizer = defaultSetup()
+                                       .setAcousticModel(new File(modelsDir, "hmm/en-us-semi"))
+                                       .setDictionary(new File(modelsDir, "dict/cmu07a.dic"))
+                                       .setRawLogDir(assetsDir).setKeywordThreshold(mVoiceThresh) 
+                                       .getRecognizer();
+                       recognizer.addListener(this);
+
+                       File kwlist = new File(modelsDir, "grammar/menu.gram");
+                       recognizer.addKeywordSearch(KWS_SEARCH, kwlist);
+               }
+               private void initSearch() {
+                       recognizer.stop();
+                       recognizer.startListening(KWS_SEARCH); 
+               }
+               @Override
+               public void onPartialResult(Hypothesis hypothesis) {
+                       if (hypothesis != null)
+                       {
+                               String text = hypothesis.getHypstr();
+                               if (text != null)
+                                       trySRCommand(text);                     
+                               initSearch();
+                       }
+               }
+
+               @Override
+               public void onResult(Hypothesis hypothesis) { }
+
+               private void trySRCommand(String result)
+               {
+            Themes.showThemedToast(AbstractFlashcardViewer.this, result, true);
+                       if (result.equals(KW_FAIL) || result.equals(KW_FAIL2))
+                       { 
+                               executeCommand(GESTURE_ANSWER_EASE1); 
+                       }
+                       if (result.equals(KW_OK) || result.equals(KW_HARD)) { executeCommand(GESTURE_ANSWER_EASE2); }
+                       if (result.equals(KW_MID)) { executeCommand(GESTURE_ANSWER_EASE3); }
+                       if (result.equals(KW_EASY)) { executeCommand(GESTURE_ANSWER_EASE4); }
+                       if (result.equals(KW_NEXT)) { executeCommand(GESTURE_SHOW_ANSWER); }
+               }
+
+       };
+       private MyRecognitionListener mKeywordRecognizer = new MyRecognitionListener();
+
     private View.OnTouchListener mGestureListener = new View.OnTouchListener() {
         @Override
         public boolean onTouch(View v, MotionEvent event) {
@@ -973,6 +1083,7 @@ public abstract class AbstractFlashcardViewer extends NavigationDrawerActivity {

         stopTimer();
         Sound.stopSounds();
+               mKeywordRecognizer.stop();
     }

@@ -985,6 +1096,8 @@ public abstract class AbstractFlashcardViewer extends NavigationDrawerActivity {
         // Reset the activity title
         setTitle();
         updateScreenCounts();
+               if (mUseVoiceCommands)
+                       mKeywordRecognizer.init();
     }

@@ -1708,6 +1821,11 @@ public abstract class AbstractFlashcardViewer extends NavigationDrawerActivity {
         mInputWorkaround = preferences.getBoolean("inputWorkaround", false);
         mPrefFixArabic = preferences.getBoolean("fixArabicText", false);
         mSpeakText = preferences.getBoolean("tts", false);
+        mUseVoiceCommands = preferences.getBoolean("voice", false);
+               mVoiceThresh = (float)Math.pow(10.0, preferences.getInt("voiceThresh", 2000)/10000.0 * 50.0 - 40.0);
+
+       
+
         mPrefSafeDisplay = preferences.getBoolean("safeDisplay", false);
         mPrefUseTimer = preferences.getBoolean("timeoutAnswer", false);
         mWaitAnswerSecond = preferences.getInt("timeoutAnswerSeconds", 20);
@@ -2733,6 +2851,7 @@ public abstract class AbstractFlashcardViewer extends NavigationDrawerActivity {

         @Override
         public boolean onDoubleTap(MotionEvent e) {
+                       mKeywordRecognizer.toggle();
             if (mGesturesEnabled) {
                 executeCommand(mGestureDoubleTap);
             }
diff --git a/AnkiDroid/src/main/java/com/ichi2/anki/Preferences.java b/AnkiDroid/src/main/java/com/ichi2/anki/Preferences.java
index 9f3e2b6..6c3b944 100644
--- a/AnkiDroid/src/main/java/com/ichi2/anki/Preferences.java
+++ b/AnkiDroid/src/main/java/com/ichi2/anki/Preferences.java
@@ -107,7 +107,7 @@ public class Preferences extends PreferenceActivity implements OnSharedPreferenc
     private static String[] sListNumericCheck = {"minimumCardsDueForNotification"};
     private static String[] sShowValueInSummSeek = { "relativeDisplayFontSize", "relativeCardBrowserFontSize",
             "relativeImageSize", "answerButtonSize", "whiteBoardStrokeWidth", "swipeSensitivity",
-            "timeoutAnswerSeconds", "timeoutQuestionSeconds", "backupMax", "dayOffset" };
+            "timeoutAnswerSeconds", "timeoutQuestionSeconds", "backupMax", "dayOffset", "voiceThresh" };
     private static String[] sShowValueInSummEditText = { "deckPath" };
     private static String[] sShowValueInSummNumRange = { "timeLimit", "learnCutoff" };
     private TreeMap<String, String> mListsToUpdate = new TreeMap<>();
diff --git a/AnkiDroid/src/main/res/values/10-preferences.xml b/AnkiDroid/src/main/res/values/10-preferences.xml
index d5619d1..60c7c4d 100644
--- a/AnkiDroid/src/main/res/values/10-preferences.xml
+++ b/AnkiDroid/src/main/res/values/10-preferences.xml
@@ -87,6 +87,10 @@
     <string name="swipe_sensitivity_summ">XXX</string>
     <string name="tts">Text to speech</string>
     <string name="tts_summ">Reads out question and answer if no sound file is included</string>
+    <string name="voice">Voice commands</string>
+    <string name="voice_summ">Use voice commands</string>
+    <string name="voice_thresh">Voice command recognition threshold</string>
+    <string name="voice_thresh_summ">Threshold value to use for voice commands</string>
     <string name="sync_fetch_missing_media">Fetch media on sync</string>
     <string name="sync_fetch_missing_media_summ">Automatically fetch missing media when syncing.</string>
     <string name="sync_account">AnkiWeb account</string>
@@ -220,4 +224,4 @@
     <string name="deck_conf_cram_reschedule_summ">Reschedule cards based on my answers in this deck</string>
     <string name="deck_conf_cram_steps">Custom steps</string>
     <string name="deck_conf_cram_steps_summ">Define custom steps</string>
-</resources>
\ No newline at end of file
+</resources>
diff --git a/AnkiDroid/src/main/res/xml/preferences.xml b/AnkiDroid/src/main/res/xml/preferences.xml
index 9d17637..e838b29 100644
--- a/AnkiDroid/src/main/res/xml/preferences.xml
+++ b/AnkiDroid/src/main/res/xml/preferences.xml
@@ -397,6 +397,22 @@
                 android:key="tts"
                 android:summary="@string/tts_summ"
                 android:title="@string/tts" />
+                       <CheckBoxPreference             
+                               android:defaultValue="false"
+                               android:key="voice"
+                               android:summary="@string/voice_summ"
+                               android:title="@string/voice"/>
+                       <com.hlidskialf.android.preference.SeekBarPreference
+                               android:defaultValue="2000"
+                               android:dependency="voice"
+                               android:dialogMessage="@string/voice_thresh_summ"
+                               android:key="voiceThresh"
+                               android:max="9999"
+                               android:summary="@string/voice_thresh_summ"
+                               android:text=""
+                               android:title="@string/voice_thresh"
+                               app:interval="1"
+                               app:min="0" />
             <ListPreference
                 android:defaultValue="0"
                 android:entries="@array/dictionary_labels"
elgalu commented 7 years ago

Note this can also be achieved outside of AnkiDroid app (at an Android level) by installing the currently beta Voice Access google app.

This is how it looks like, note I need to say the numbers out loud so Voice Access knows what to "click". For example I say "twelve" for AGAIN, "thirteen" for GOOD and so on (voice commands)

Here the installation instructions. Perhaps can also be achieved in a similar way with utter but haven't tried that yet.

I guess in can also be done inside the app perhaps by using the Google Cloud Speech API and following this speech/grpc example code

Guys thanks for Anki!!! is awesome!!!

antgel commented 7 years ago

@aegray Any idea what happened here (why your patch wasn't applied)? Do you have a patch against current master?

timrae commented 7 years ago

There are several factors here...

1) a pull request was never submitted so we never reviewed the code. If the author of the patch doesn't have time to make a pull request, they probably don't have time to see the feature through beta testing bug fixes to release quality either. For us it just adds more maintenance burden

2) another user gave a solution to achieve the same result without modifying AnkiDroid.

Given 2) above we would need a very persuasive argument to include the patch even if 1) were to be "solved"

antgel commented 7 years ago

That's why I pinged the patch author.

As for the "persuasive argument", that's a bit surprising. The external app (Voice Access) is clunky at best, and I think most users would see that as an unwieldy workaround. To have this in the app would take the UX up a notch or two.

jimmitt commented 7 years ago

I agree that the voice access method is clunky. I tried it for a bit but ultimately uninstalled it because enabling and disabling it were annoying and I didn't like having to look at the screen to figure out which number to say. It would be much better if it were built in and I could just say "again", "good", "easy", etc.

ajm949 commented 7 years ago

Many thanks elgalu for advice that Voice Access can be used to provide hands free ankidroid access. It's even easier to set up now:

  1. Download and install Voice Access from play store
  2. Open it
  3. Start ankidroid
  4. All touchscreen options can be operated by reading the number written next to them

So pleased to have this

Kirikus commented 6 years ago

Currently Voice Access is not downloadable: Early access program is currently full, space may open up later.

I got patch for linux pc up and running, but is there any working solution for mobile?

renehamburger commented 6 years ago

You can download Google Voice Access from APKMirror, which seems to be the only safe site to download APKs directly outside the Play Store. Using it with AnkiDroid works like a charm!

github-actions[bot] commented 4 years ago

Hello 👋, this issue has been opened for more than 2 months with no activity on it. If the issue is still here, please keep in mind that we need community support and help to fix it! Just comment something like still searching for solutions and if you found one, please open a pull request! You have 7 days until this gets closed automatically

david-allison commented 4 years ago

Massive number of +1s, let's keep this open.

svenmeier commented 3 years ago

For testing of voice-control I've built a solution here https://github.com/svenmeier/Anki-Android/tree/voice-control

Maybe someone wants to try this out.

mytestingsolution commented 3 years ago

Hi, I'm really a neophyte when it comes to all of this, but I'm trying to give your voice-control an example a shot, but I can't seem to figure out how to download the folder to put on my device. I'm sure I'm missing something very obvious, but this would be an extremely helpful feature to have!

svenmeier commented 3 years ago

I've gave up on using voice control, because the PocketSphinx speech recognizer didn't work reliable enough :(.

mytestingsolution commented 3 years ago

Ahh...thank you so much for taking the time to give an update! Wishing you the best.

On Wed, Jan 20, 2021 at 10:34 AM Sven Meier notifications@github.com wrote:

I've gave up on using voice control, because the PocketSphinx speech recognizer didn't work reliable enough :(.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ankidroid/Anki-Android/issues/1717#issuecomment-763847915, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALDCF4NOF5GKTCQX2RXHATLS24OZLANCNFSM4BPRF2PA .

WiliTest commented 2 years ago

Yes, we could use Google Voice Access, but it's very invasive and doesn't work well without internet access.

EDIT: Google Voice Access, doesn't work for us, non-natives for us (my friend has the same problem). It doesn't even understand the difference between "again", "hard", "good", "easy". Eg: when we say "good" it doesn't understand it or it clicks on "again" (my accent isn't bad enough to confuse them, and google voice is even worse in my native language).

**Instead of implementing a complex voice recognition, it might be easier to use some very basic sounds like a high-pitched sound or several repetitions eg. 3 low-pitched sounds* (). Eg. the user would simply have to record (or choose) a basic sound she wants to match with a specific button (ex. 2 low high-pitched sound buttons could be attached to the "again" button while reviewing). The app would "simply" have to calculate if the recorded sound matches (very approximately) the sound previously attributed to the button.

Arthur-Milchior commented 2 years ago

A majority of current maintainer has reached a decision regarding voice-control. It's detailed on https://github.com/ankidroid/Anki-Android/wiki/Absence-of-voice-control-feature TL/DR, the risk of accident is too high with it. For ethical reason, we decided not to accept such a feature

euu2021 commented 2 years ago

Interesting! The final decision is bitter to me, but I admire the developers for being proactive and bold about ethical matters 👏👏👏

EDIT: I forgot to say that my family will also be grateful. Listening to me talking to Anki all day would be pure hell.

antgel commented 2 years ago

So very bizarre. Users can also use the touch-screen interface when driving. Let's disable that as well. Forget the idea that voice control enables users to concentrate more on the road than having to look at a screen. :man_shrugging:

mikehardy commented 2 years ago

@antgel you can deliberately misinterpret the reasoning all you like, the reasoning is still sound.

As a vulnerable road user, that is, a cyclist, I can tell you that there is a shocking amount of stupidity amongst drivers who seem to think that staring at their phone and doing things like Candy Crush or Whatsapp or even AnkiDroid is acceptable but at least the laws are clear on the subject and if they're not cretins they'll know deep down they're endangering themselves and everyone else they share the road with. So your argument is a strawman and a lame one at that as it's directly made implausible by a vast body of law

On the other hand voice control is a touchier area. And what our introspection on it resulted in is the decision - paternalistic or not - that we won't support any use of AnkiDroid that enables this sort of risky behavior, which we've become aware is apparently prevalent. Sadly, as mentioned above, there are enough selfish idiots (being blunt there but we're talking about multi-ton metal-block pilots deciding they know better than the law how their brain works) using AnkiDroid already with touchscreen, that voice would be like tacitly condoning dangerous behavior.

Interpret that how you like

Jdodo45 commented 2 years ago

Too bad for people who don't drive (I don't), and wanted to use it in other conditions. And even worse (following your rather weird argumentation): too bad for those who will die because they will be crushed by a driver who was staring at her Anki screen to press the right button. (Have you even considered that the current feature could cause more death than the one requested here?)

zaz commented 2 years ago

Summary: It is not okay to block accessibility features (it's discrimination) and there is no evidence that adding voice control would decrease safety (in fact, I believe @antgel is right that safety is added by providing a safer substitute to using Anki by touching it).

Sorry for not weighing in earlier, as I realize that a decision has already been made on this and there is a large inhibition to admitting a mistake and reversing a decision; however, I have spent significant time considering and researching this matter, and I sincerely hope you will take the time to read my response and reconsider your opinion on this matter in the light of peer-reviewed driving distraction studies and the effect your decision will have on your users (especially the potential for having a serious impact on the blind community).

Generally, free software should respect the user's freedom. You making moral decisions on behalf of your users is antithetical to this principal. This is because you are assuming that you are in a better position than your user to make an ethical decision, in spite of the fact that you don't know what their intended use case is, nor what their local laws are. Instead, you have become hyper-focused on the use case of Anki while driving, and have asserted, without evidence, that adding a voice-control feature will increase accidents. In reality, all that is being proposed is an accessibility feature that would allow more people to use the program, and allow current users to use it in more circumstances.

Specifically, although a few comments here mention driving, there are many other uses: exercising, walking, or simply lying down with your eyes closed to better focus entirely on the language. Most importantly, this feature would make Anki accessible to blind people. This is particularly important to consider as they are completely underrepresented in this discussion because they currently can't use Anki and Anki would be particularly useful to them as they have less options for learning languages than other people.

Legally, in America, if you were a commercial entity, you would be in violation of the ADA for blocking such a feature. While I doubt anyone is going to successfully sue you, it is an indication that what you are doing is wrong. Additionally, in most states, driving using hands-free features is legal, while touching your phone is not. This is consistent with evidence that visual and manual distractions are much less safe than cognitive or audible distractions.

Safety: You make the assertion that voice control would decrease safety, but you provide no evidence for this. Here is a rigorous analysis of whether adding voice control would increase safety:

Define the excess accident rate as the rate at which Anki users get into car accidents above the rate at which the general population does.

increase in number of people using ___ while driving × excess accident rate while using = excess number of accidents while using
Anki voice control positive × low (possibly negative) = possibly negative
Anki touch negative × high = negative
Total positive × likely negative = increase in total accidents: likely negative

Consider that there are currently some users who use Anki touch while driving and that if Anki voice control is released, two things will happen: Some users who don't use Anki while driving will start using Anki voice. Some users who use Anki touch while driving will start using Anki voice instead.

We know, because of this study, that the excess accident rate while performing cognitive tasks hands-free with no visual distraction is low (this rate is positive when compared with undistracted driving, but is actually slightly negative compared with the general population due to the fact that some people are engaging in more dangerous activities such as touch-controlled apps, daydreaming, etc).

Let $n$ be the increase in number of people using ___, $x$ be the excess accident rate above the general population, $v$ be Anki voice control, and $t$ be Anki touch. The amount of accidents caused by releasing voice control for Anki will be:

$$ n_v \times x_v + n_t \times x_t $$

Assuming Anki voice does indeed introduce some positive excess risk and that number of drivers using Anki touch will decrease due to switching to Anki voice, the first summand is positive while the second is negative, allowing us to algebraically work backwards from the assumption that introducing this feature will increase accident rate:

$$ n_v \times x_v + n_t \times x_t > 0 \iff n_v \times x_v > -n_t \times x_t \iff \frac{n_v \times x_v}{-n_t \times x_t} > 1 \iff \frac{n_v}{-n_t} \times \frac{x_v}{x_t} > 1 $$

So whether we are increasing the number of accidents depends on whether the above inequality holds, which depends on the ratio of number of people who start using Anki voice to the number of people who stop using Anki touch (recall that $n_t$ is negative so $-n_t$ is positive) times the ratio of the excess accident rate using voice to the excess accident rate using touch. From Table 4 of that study (available on SciHub), the latter ratio is very low (cognitive distractions are far less dangerous than visual and manual distractions) so for voice control to increase risk, the $n_v/n_t$ ratio would have to be very high.

How high? Well, according to the aforementioned study, engaging in a purely cognitive task, including talking, singing, or even talking on a handheld cell phone actually reduced accident rate compared to the general population (possibly because it means people are not daydreaming or performing other risky activities). In this case $x_v$ is negative and so safety is increased no matter how high $n_v/n_t$ is. Conversely, cell phone texting/browsing/dialing, which is close to using Anki touch, was associated with a 1.48 OR of crashes and 2.19 OR of serious crashes.

That study did not examine Anki or Anki-like apps specifically, but it does show that at worst, cogitative distractions have a negligible increase on accident rate (due to eyes still being on the road and hands still being on the wheel), and at best, could actually decrease accident rate by keeping drivers awake and alert.

If you wanted to find a study that examined a cognitive distraction more closely resembling Anki and come up with a way to estimate $n_v/n_t$, you could use the equation above to calculate if you're increasing or decreasing risk by adding voice control, but I think you'd get the same results.

There is certainly no reason to think that adding voice control increases risk. The evidence we have suggests that adding voice control will actually decrease overall risk.

Benefit of the doubt:

In light of the fact that there's no actual evidence that adding voice control will decrease safety, the benefit of the doubt should be given to the user to make that determination themselves (as it is with almost every other app) so that, among many other things, we are not precluding blind people from using Anki.

zaz commented 2 years ago

@mikehardy, I don't think anyone is trying to antagonize you or your team. In fact, I think the point @antgel is trying to make is related to the one you yourself make:

Sadly, as mentioned above, there are enough selfish idiots (being blunt there but we're talking about multi-ton metal-block pilots deciding they know better than the law how their brain works) using AnkiDroid already with touchscreen, that voice would be like tacitly condoning dangerous behavior.

Interpret that how you like

@antgel's interpretation (and mine also), is that by adding voice control, we are not condoning this dangerous behavior, but providing an alternative for it. This is because cognitive+audio distractions are not at all like manual+visual distractions, they are much safer, as backed up by scientific evidence (see my post above) and are allowed by law in many places.

The evidence does not suggest Anki voice would be more dangerous while driving than what the average person does (chats, daydreams, etc), and even if it was slightly more dangerous, it would still be worth adding voice control so that these idiots already using AnkiDroid with touchscreen have a much safer alternative.

"distraction" is not the real killer. Doing flashcards is not going to kill you any more than daydreaming, or talking, or whatever else you already do in the car. Looking away from the road for a second or two while going 60+ mph is what is going to kill you.

timrae commented 2 years ago

FWIW, I'm tending to side with @zaz on this one. I don't feel the argument for blocking the feature is compelling enough when weighed against the principles of inclusivity and freedom, though I'm open to a strong counter argument. I wonder if showing a pop up dialogue everytime the feature is engaged could be an acceptable middle ground?

euu2021 commented 2 years ago

Looking away from the road for a second or two while going 60+ mph is what is going to kill you.

The problem is that users will still keep looking at their phones, even if they are using voice control. Reviewing in Anki is very different from the activities analyzed in the studies you mentioned. At least that is my personal experience using Anki "hands-free", walking in my bedroom (language learning):

So, making it more convenient to use Anki hands-free will make more people have the terrible idea of using Anki while driving, and a certain number (most?) of those people will use the thing in the way I described above, i.e., LOOKING AT THE SCREEN TO READ THE ANSWER.

So, the main question here is: how many people complete a review session without looking at the screen? I think it's a small minority. The big majority will still keep looking at the screen, and accidents will INCREASE.

mikehardy commented 2 years ago

Agree to disagree. Coding hard beats commenting hard, and the folks coding have decided. We provide several apis that may be used as extension points for anyone so motivated to make an app that does it, using the API.

timrae commented 2 years ago

Well I can't argue with that... I do feel though, that the conversation could potentially be reopened if all of the following conditions were met:

  1. A sizeable number of users were commenting identified who were personally experiencing hardship due to legitimate accessibility issues
  2. There were solid reasons why using the API would lead to an inferior solution
  3. Someone was willing to submit and champion a high quality PR (and follow through with the code review and maintenance)

However AFAICT as it currently stands, none of these conditions have been met.

wantsomegetsome commented 2 years ago

Lol. No. 7 years.

Reality matters. It's not a safety issue you are worried about. You very well know how to solve the driving problem with voice control functionality.

Sigh. Let me help the people who don't know the truth... Keep it in mind that it's been 7 years. If you are worried about driving, you clould simply disable voice control if moving faster than a jogger - say 6 mph. Surely you thought of this right? Driving problem solved!!! Come on. Tell the truth. After 7 years! it's really not a safety thing, but more of a you don't want to code it and maintain it thing. Right?

So, you are screwing the people who have vision disabilies, people who want to review while exercising, house cleaning, yard work, etc.

Sigh. No. It will not stop people who want to review while driving. They will just look at it. You are actually making it more dangerous by not having voice control.

zaz commented 2 years ago

@eu2021: I would never use it that way in a vehicle, but if that is a concern, you could blank the screen during voice-control mode and that would eliminate the problem of people looking at their screen. This would make Anki safer and more inclusive.

@mikehardy I'm not disrespecting your authority as the contributors to AnkiDroid, nor am I demanding you write a patch. But I've shown evidence that the refusal to accept a patch is having the opposite-to-intended effect on safety and I do think that is something that warrants a re-discussion of this issue in light of that new evidence. Safety-enhancing methods such as blanking the screen or a pop-up warning could be used.

@timrae: #1 is a bit of a Catch 22: You don't have any blind users yet because AnkiDroid is inaccessible to them, so of course you are not going to have blind people commenting that it's inaccessible - because of extreme sampling bias. #3 is also somewhat of a Catch 22 now, as the team has stated they won't accept a PR, so what rational person would put in the effort to writing one?

As for #2, I do believe we should make voice control work out of the box to reduce the number of people using AnkiDroid visually+manually while driving. I'm ignorant of the internals here, but I am also guessing that using the API would be more complex (e.g. things like localization may have to be duplicated).


Also: Thank you all for your hard work maintaining AnkiDroid. Regardless of my strong opinions on this issue, I do appreciate your work every day.

timrae commented 2 years ago

@zaz

https://github.com/ankidroid/Anki-Android/pull/3 is also somewhat of a Catch 22 now, as the team has stated they won't accept a PR, so what rational person would put in the effort to writing one?

All I'm saying is that I'd potentially be open to facilitating a reopening of that conversation (as former head maintainer of AnkiDroid) if all the conditions were met. That's not a promise that the reopening of the conversation would lead to a change in the outcome, just that the conversation could potentially be reopened. Basically you need some research on accessibility and safety, a design proposal, and a developer that's willing and able to see the feature through to the end.

eginhard commented 2 years ago

The scope and discussion of this issue were not really about accessibility in the first place, the aim was to have a hands-free, voice-controlled reviewer as a convenience feature. As @mikehardy mentioned, you should be able to create an external app for this using the available API if you really want it. The accessibility issues are mostly unrelated and much harder to solve both for technical reasons and lack of users and experience among the core contributors.

Blind people don't need voice control (but may also benefit from a hands-free convenience feature in the same way), they use phones through the touchscreen and Talkback. The main challenge for this is probably interaction between Talkback and Ankidroid's TTS for card content. But it requires the entire app, not just the reviewer, to work well with Talkback and is tracked separately in #7913.

Voice control is an accessibility feature for people with motor impairments who can't use touchscreens. For this, Google provides the Voice Access app mentioned above. Similar to Talkback, a single service that you can use across all apps is much better than each app implementing their own thing. And again, the task here is to ensure the entire app is compatible with it, not just that you can say "good", "hard" etc. in the reviewer. Physical switches are also a common alternative input method for such users.

I'm not strictly against providing a voice control convenience feature, but I support this decision. However, it's misleading to argue that it would solve any all accessibility issues or even worse, that it must be implemented to prevent car accidents by people illegally using the app while driving.