AssemblyAI-Community / realtime-voice-command-recognition

Build your own real-time voice command recognition model with TensorFlow
41 stars 22 forks source link

Does not recognise voice commands #1

Open ScouTy747 opened 2 years ago

ScouTy747 commented 2 years ago

Hello !

I really liked your project and wanted to test it. I made the Tensorflow tutorial + your youtube video. Everything was working perfectly until I ran the "main.py" program - the arrow keeps moving to one side and the command automatically says "Predicted label: Down" without me saying anything... Do you please have any idea where I might have made a mistake ? I've done the whole tutorial several times and I still have the same problem...

Thank you ! :)

EDIT: all the time I had a problem with the microphone despite several tests... I've solved it now, thank you !

raziurtotha commented 1 year ago

@ScouTy747 Hi, could you please explain what the issue was? I have been trying this project for weeks. I have tried all the similar codes from the tensorflow official website and other projects. But I have been experiencing the same issue as yours. Would really appreciate your response. Thanks a lot.

ScouTy747 commented 1 year ago

@WarMilitant Hi, I had to turn on microphone permission in Windows. I can already move the arrow in different directions, but it doesn't wait for my voice command - as long as I don't say anything, it goes straight. I still don't know where the problem is...

second problem - even though I created my own dataset of 1 second recordings , I can't train it... I can train the dataset from the tutorial but not my own... I'm trying to fix this as well

raziurtotha commented 1 year ago

@ScouTy747 Hey, thanks a lot for replying. My microphone permission is already turned on. By arrow, I guess you mean the turtle command's arrow. yh, I have the same problem. the model keeps outputting the first class's label even though I am not saying anything to the mic. I suppose, the room has to be really quiet... What issue are you having while training on your own dataset? I tried it and had no issue. The model works well on a single loaded input from a file but works strangely when the input is from the microphone.

ScouTy747 commented 1 year ago

@WarMilitant Yes we have the same problem. :(

I have created a python script through which I can create 1-second recordings through the microphone. When I want to start training my recordings it gives me the error below. Do you have any idea how I could fix it ? I've been trying to fix it for a week and nothing works... I've looked at different forums and solutions but still unsuccessfully. Thank you !

tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__IteratorGetNext_output_types_2_device_/job:localhost/replica:0/task:0/device:CPU:0}} Can not squeeze dim[2], expected a dimension of 1, got 2 [[{{node Squeeze}}]] [Op:IteratorGetNext]

raziurtotha commented 1 year ago

@ScouTy747 The issue is not having a "background noise" class in your dataset. Also make sure you have enough data for background noises. For example, If you look at Google's teachable machine's audio project (https://teachablemachine.withgoogle.com/train/audio), a "background noise" class is already there, you cannot delete or disable the class.

I tested with both codes provided on tensorflow's github example (https://github.com/tensorflow/examples/blob/master/lite/examples/sound_classification/raspberry_pi/classify.py) and on tensorflow's website (https://www.tensorflow.org/tutorials/audio/simple_audio). They both work well for your prediction as long as you have enough background noise samples in your dataset considering the particular environment you are testing it in.

I made slight changes to the tensorflow's github code to output the category name and category confidence score.

  while True:
    # Wait until at least interval_between_inference seconds has passed since
    # the last inference.
    now = time.time()
    diff = now - last_inference_time
    if diff < interval_between_inference:
      time.sleep(pause_time)
      continue
    last_inference_time = now

    # Load the input audio and run classify.
    tensor_audio.load_from_audio_record(audio_record)
    result = classifier.classify(tensor_audio)
    for category in result.classifications[0].categories:
      print(category.category_name, category.score)

Hope it's helpful for playing around with similar projects.

jophex commented 12 months ago

im facing the same problem i also enabled the microphone and still nothing i tried to reduce the frames nothing too and also the module is making a poor prediction the update done is performing poor the label is NO and the prediction is GO can anyone seen that please help Thanks in Advance