Closed xiaocao666tzh closed 2 years ago
Please upload the model.h
file somewhere so I can see it.
here is the model.h file , thanks a lot.
#pragma once
#include <stdarg.h>
namespace Eloquent {
namespace ML {
namespace Port {
class RandomForest {
public:
/**
* Predict class for features vector
*/
int predict(float *x) {
uint8_t votes[1] = { 0 };
// tree #1
votes[0] += 1;
// tree #2
votes[0] += 1;
// tree #3
votes[0] += 1;
// tree #4
votes[0] += 1;
// tree #5
votes[0] += 1;
// tree #6
votes[0] += 1;
// tree #7
votes[0] += 1;
// tree #8
votes[0] += 1;
// tree #9
votes[0] += 1;
// tree #10
votes[0] += 1;
// tree #11
votes[0] += 1;
// tree #12
votes[0] += 1;
// tree #13
votes[0] += 1;
// tree #14
votes[0] += 1;
// tree #15
votes[0] += 1;
// tree #16
votes[0] += 1;
// tree #17
votes[0] += 1;
// tree #18
votes[0] += 1;
// tree #19
votes[0] += 1;
// tree #20
votes[0] += 1;
// tree #21
votes[0] += 1;
// tree #22
votes[0] += 1;
// tree #23
votes[0] += 1;
// tree #24
votes[0] += 1;
// tree #25
votes[0] += 1;
// tree #26
votes[0] += 1;
// tree #27
votes[0] += 1;
// tree #28
votes[0] += 1;
// tree #29
votes[0] += 1;
// tree #30
votes[0] += 1;
// return argmax of votes
uint8_t classIdx = 0;
float maxVotes = votes[0];
for (uint8_t i = 1; i < 1; i++) {
if (votes[i] > maxVotes) {
classIdx = i;
maxVotes = votes[i];
}
}
return classIdx;
}
/**
* Predict readable class name
*/
const char* predictLabel(float *x) {
return idxToLabel(predict(x));
}
/**
* Convert class idx to readable name
*/
const char* idxToLabel(uint8_t classIdx) {
switch (classIdx) {
case 0:
return "shang1";
default:
return "Houston we have a problem";
}
}
protected:
};
}
}
}
predict()
it's there, so the actual problem lies somewhere else. Please upload your .ino
fileIf you want to do KWS, either 1) use EdgeImpulse or similar software, 2) record many samples of you saying something different from the keyword you want to recognize or many sounds different from the one of interest (for example street noise, TV backgound, any sound from YouTube). Then it may work, but it's not a guarantee.
here is the arduino project file, thanks again:
#include <arduinoFFT.h>
// uncomment when doing classification
#include "model.h"
#define MIC A0
#define NUM_SAMPLES 64
#define SAMPLING_FREQUENCY 1024
#define INTERVAL 5
#define SOUND_THRESHOLD 3
unsigned int samplingPeriod;
unsigned long microSeconds;
int32_t backgroundSound;
double features[NUM_SAMPLES];
arduinoFFT fft;
void setup() {
Serial.begin(115200);
pinMode(MIC, INPUT);
samplingPeriod = round(1000000*(1.0/SAMPLING_FREQUENCY));
calibrate();
}
void loop() {
if (!soundDetected()) {
delay(10);
return;
}
captureWord();
printFeatures();
// uncomment when doing classification
Serial.print("You said ");
Serial.println(classIdxToName(predict(features)));
delay(1000);
}
/**
Get analog readings
@return
*/
int16_t readMic() {
return analogRead(MIC);
return (analogRead(MIC) - 512) >> 2;
}
/**
Get "ambient" volume
*/
void calibrate() {
for (int i = 0; i < 200; i++)
backgroundSound += readMic();
backgroundSound /= 200;
Serial.print("Threshold set at ");
Serial.println(backgroundSound);
}
bool soundDetected() {
return abs(readMic() - backgroundSound) >= SOUND_THRESHOLD;
}
void captureWord() {
for (uint16_t i = 0; i < NUM_SAMPLES; i++) {
microSeconds = micros();
features[i] = readMic();
while(micros() < (microSeconds + samplingPeriod));
}
fft.Windowing(features, NUM_SAMPLES, FFT_WIN_TYP_HAMMING, FFT_FORWARD);
}
void printFeatures() {
const uint16_t numFeatures = sizeof(features) / sizeof(double);
for (int i = 0; i < numFeatures; i++) {
Serial.print(features[i]);
Serial.print(i == numFeatures - 1 ? '\n' : ',');
}
}
Ok, you're reading an old blog post, the code changed. Here it is.
#include <arduinoFFT.h>
// uncomment when doing classification
#include "model.h.cpp"
#define MIC A0
#define NUM_SAMPLES 64
#define SAMPLING_FREQUENCY 1024
#define INTERVAL 5
#define SOUND_THRESHOLD 3
unsigned int samplingPeriod;
unsigned long microSeconds;
int32_t backgroundSound;
double features[NUM_SAMPLES];
arduinoFFT fft;
// add this
Eloquent::ML::Port::RandomForest classifier;
void setup() {
Serial.begin(115200);
pinMode(MIC, INPUT);
samplingPeriod = round(1000000*(1.0/SAMPLING_FREQUENCY));
calibrate();
}
void loop() {
if (!soundDetected()) {
delay(10);
return;
}
captureWord();
printFeatures();
// uncomment when doing classification
Serial.print("You said ");
// replace this
Serial.println(classifier.predictLabel(features));
delay(1000);
}
/**
Get analog readings
@return
*/
int16_t readMic() {
return analogRead(MIC);
return (analogRead(MIC) - 512) >> 2;
}
/**
Get "ambient" volume
*/
void calibrate() {
for (int i = 0; i < 200; i++)
backgroundSound += readMic();
backgroundSound /= 200;
Serial.print("Threshold set at ");
Serial.println(backgroundSound);
}
bool soundDetected() {
return abs(readMic() - backgroundSound) >= SOUND_THRESHOLD;
}
void captureWord() {
for (uint16_t i = 0; i < NUM_SAMPLES; i++) {
microSeconds = micros();
features[i] = readMic();
while(micros() < (microSeconds + samplingPeriod));
}
fft.Windowing(features, NUM_SAMPLES, FFT_WIN_TYP_HAMMING, FFT_FORWARD);
}
void printFeatures() {
const uint16_t numFeatures = sizeof(features) / sizeof(double);
for (int i = 0; i < numFeatures; i++) {
Serial.print(features[i]);
Serial.print(i == numFeatures - 1 ? '\n' : ',');
}
}
Modifications at:
Again, as is now, the classifier will always print "shang1": you have to train it on more that one word, or in your case, train on "shang1" vs "everything but shang1"
It's very kind of you! I'll try. Actualy we want to spot an animal sound from wild environment, helping to find the animals(for example wolf). So we only need some animal sound samples in vary background noises and train them, Is that right? Could we talk in twitter or telegram if we have some more questions?
You will actually need any possible sound that may happen where you're going to deploy your system. Will you deploy in a forest? Go into a forest and record the birds, the wind, the woods, other animals and so on. You may try to replicate this kind of noise by using a known dataset (for example Urban sound), aggregating all the classes it has under the "non shang1" and train the classifier. Honestly, I don't know how well it will perform.
Of course, remember to 1) balance your dataset (since you will have many more samples of "negative" class than "shang1") or 2) properly set the class_weights parameter in random forest or 3) any other tecnique to handle imbalanced datasets.
Yes, you can reach me on Twitter. I warn you that I'm not always available though.
today I try again, a new compile error appears:
C:\Users\zhangkunyi\Desktop\arduinoSoundClassifier\main\main.ino: In function 'void loop()':
main:40:52: error: no matching function for call to 'Eloquent::ML::Port::RandomForest::predictLabel(double [32])'
Serial.println(classifier.predictLabel(features));
^
In file included from C:\Users\zhangkunyi\Desktop\arduinoSoundClassifier\main\main.ino:3:0:
C:\Users\zhangkunyi\Desktop\arduinoSoundClassifier\main\model.h:4361:33: note: candidate: const char Eloquent::ML::Port::RandomForest::predictLabel(float)
const char predictLabel(float x) {
^~~~
C:\Users\zhangkunyi\Desktop\arduinoSoundClassifier\main\model.h:4361:33: note: no known conversion for argument 1 from 'double [32]' to 'float*'
changed all folat type to double, it works.
here is my code: code
I modified the model.h file and it work! but the result is regret. It can't correctlly classify even tow words :"shang" , "xia". o Orz...
this method is suitable for some simple sound,for example the sounf of machine, but not good for speach(maybe I didn't do it in right way). Anyway , thank Simone very much!
You're correct. I will make it more clear on the blog, this is only for simple detection. I was able to distinguish "play" from "stop".
Dear man: After I recorded the sound data and train them, I use micromlgen to output a "model.h" file. But when I include it to my project file and try to compile it, I found the "cstdarg.h" is missing. I changed it to "stdarg.h" and compiled again, it come out a error: 'predict' was not declared in this scope. Is there anything wrong?