Closed eyalis closed 2 years ago
Hi @eyalis,
To help better assist with your problem, can you provide more information?
Thank you @Caldarie for your quick answer, here it goes:
Describe the problem: When I try the example it always matches the first word of the category file.
Thanks for your help!
Hmm, I’m that case:
Thanks, so:
Have you tried adjusting your bufferSize around? For example, 22050, 5000 etc Yes, I tried with: 22050, 5000, 7000, 30000, etc.
Can you tell me your sample rate? I'm using the one that comes with the example which is 44100
Do you still get the same result if you run the “decodedWav” model in the example No, that one works pretty good!
Looking at your answers, we can rule down to one possibility.
I think the most likely reason why you’re not getting a response, is because your model isn’t trained well enough to distinguish your desired sound to other sounds/background noise.
Make sure that you increase the number of training examples and remove any prolong silences. (The goal is to make it as distinct as possible). Also for your background noise, you can add variations such as white noise, ambient noise etc. Try not to mix your voice in the background noise.
As for the reason why the GTM model in the example is not working for you.. to be honest, it was hastily put together with minimal training examples; so unless you say the words very distinctly, it will not register.
Thank you for your answer, my model works well in the teachable machine (I have tested it with the tool they provide in the platform), I think it may be another problem, maybe related to my device configuration or something like that.
Thank you anyway, I'll keep trying and if I find what was the problem/solution, I'll let you know!!!
Ah, i see. In that case, maybe you can tinker around with the sample rate. I heard that some people had better results with 16000
Hi @eyalis,
I've updated the package (0.1.7) where you can now adjust the detection threshold of the model. I believe this should help fix the issue you are having. For example:
If your model is frequently matching the first category, simply adjust detectionThreshold to a lower value.
Example on how to implement this:
final String model = 'assets/google_teach_machine_model.tflite';
final String label = 'assets/google_teach_machine_label.txt';
final String inputType = 'rawAudio';
final int sampleRate = 44100;
final int recordingLength = 44032;
final int bufferSize = 22050;
final int numOfInferences = 1;
final double detectionThreshold = 0.4; //Adjust this value here
Let me know if this fixes your problem
Hi @Caldarie thank you so much, I'll try it asap and let you know!
@Caldarie the same is happening to me. The inference always returns the first category 😕 I tested with both versions 0.6.2 and 0.7.1
Hi @cmalbuquerque,
For testing purposes, can you tell me if you still see the same problem after lowering your detectionThreshold to maybe 0?
@Caldarie yes, I am using the following settings:
result = TfliteAudio.startAudioRecognition(
numOfInferences: 1,
inputType: 'rawAudio',
sampleRate: 16000,
recordingLength: 44032,
bufferSize: 2000,
detectionThreshold: 0,
);
My sample rate is low as well as the bufferSize but I need to use these values in order to increase the time of recording. I will try to use other model with sounds more distinct and then I leave feedback here
@cmalbuquerque
No problems at all, let me know the results of your other model.
Just be aware that detection threshold ignores any predictions where it’s probability doesn’t exceed the set value. For example:
If your detection threshold is set as 0.5 (50%) and your model predicts the audio label as “yes” at 0.4 (40%) and “no” at 0.1 (10%), your prediction will be ignored and hence the problem where the label matches the first label.
As mentioned in comment #10 to @eyalis, it’s highly probable that the model isn’t trained well enough and you might need to train more data to rectify this issue .
Alternatively, you can lower your detection threshold, but I don’t advise going any lower than 0.30, as you are forgoing your models performance.
@Caldarie,
Many thanks for the help and the detailed explanation! I used other model with detectionThreshold: 0.4
and it works perfectly 😁
@cmalbuquerque
I’m very glad to hear that! 😁
It has come to my attention that this problem may also be due to bufferSize.
Its possible that a higher bufferSize doesn't allow your device enough time to record, hence the problem where it always match the first category.To fix this, simply adjust the bufferSize to a lower value.
Likewise, if your bufferSize is too low, the recording length will be too long and your model may possibly register it as background noise. Simply adjust the bufferSize to a higher value.
@Caldarie Thank you for this awesome package. I am facing the same issue. I tried all of the above solutions, none of them worked. I am thinking I might made some mistake while training the model on GTM. Can you suggest what should be "delay" and "duration" in class while training.
Hi @amardeep-singh97,
To assist you with your problem, will it be possible for your to share your model?
@Caldarie Yes Here you go, Here we will get class 2 on saying "Help". converted_tflite (1).zip
Hi @amardeep-singh97
I ran through your example, and it seems that your were right about your model's accuracy (even after tuning the parameters). However, I have noticed that your model does at times pick up the sound after some slight delay but performs poorly when speaking too quickly or slowly.
Let me know how this works for you.
@Caldarie I think that too, i trained it with 60 background noise samples and 40 help samples. Sample size is 1 second on GTM. Can you suggest improvements
@amardeep-singh97
Let me run some tests with the source code. As mentioned above, it seems that many people are experiencing the same problem as you.
Oddly enough, this seems to only be a problem with GTM models.
@Caldarie Okay. Thank you for your support. Let us know what you find.
Hi @amardeep-singh97,
I have tested my own GTM model provided in this repository, and had much better result with the following values:
final double detectionThreshold = 0.3;
final int averageWindowDuration = 0;
final int minimumTimeBetweenSamples = 0;
final int suppressionTime = 0;
final int minimumCount = 0;
Unfortunately, I haven't had much success with the model you have provided. It seems that your model (on average) considers most of the voice input "help" at around 20% confidence. I think the most likely explanation that your model has difficulty differentiating with background noise is because there's a lot of space in the recording length. For example:
Audio input takes up a small portion of recording length
Background noise
As you can see in the first picture, it is very similar to the second picture, albeit the small recording input. Have you tried trimming this space?
@Caldarie I am training a new model with more variation in data. I will let you know if i face any issues. I'll consider arguments in your solution.
@amardeep-singh97 keep us update! :)
I have the same problem but only on one device. I created a model with Google Teachable Machine and tested this on two devices:
Samsung Galaxy S9 Plus The first label is always detected here. The logged raw scores are: [NaN, NaN, NaN]
Samsung Galaxy S20 The detection works perfectly here
Both were tested under the same conditions.
Any suggestions?
Hi @fabian-rump,
That’s a very peculiar problem.
With the S20, will it from time to time output NaNs?
As for S9, does this device always output NaNs? Or does it, from time to time, output a score?
The S20 outputs a NaN in one of hundreds of cases. The S9 always outputs NaN. I haven't yet been able to get a score on the S9.
Hey guys, I have found the cause and fixed the problem. Seems like it was right under my nose. Will now close the issue
How to get value raw score and recognition result tfliteaudio at the same ?
When I try it, it always interprets the sound as the first category listed in the file