Open gamename opened 9 months ago
@gamename going by the size of the model, it is a quantised model I believe. If not, I would suggest you to quantise it to int8 weights. That'll reduce the size of the model to about 1/4th. Have you tried it with .cc
first? Converting to .cc
doesn't really increase the size of the model when the file get's embedded into the application, if you are concerned about that. The .cc file looks larger but the array size it puts the model in is still smaller.
About SD card, unfortunately, I have not really tried this approach. It definitely is a worth of a try IMO. When loading from the SD card however, it makes sense to not convert to .cc as you suggest.
Let me know how it goes. If you need further help or want me to try, do let me know.
@vikramdattu
What process did you use to build the yes_micro_features_data.cc
file? I'm not referring to the xxd
conversion. I'm referring to everything up to that. :)
The reason I ask is the C array in yes_micro_features_data.cc
is tiny. I would like to replicate that size for my cat meow identification too.
Thanks -T
Hi @gamename this is test data and I had taken it long back from googles's tflite-micro. Currently, the feature generation happens via a different model in this file and the features are then fed to detection model.
The tools here can help you train your own model, evaluate it and convert it.
Thank you, sir.
@vikramdattu
Is your pre-processor model taken from here?
The reason I ask is because it seems the pre-processor should work for "meow" as well as human speech. It just generates spectrograms. That's payload agnostic (i.e., it just makes a spectrogram of a sound and doesn't care what sound it is). Correct?
Thanks, -T
@gamename that's right, the model is taken from that particular location.
@gamename that's right, the model is taken from that particular location.
Perfect. Thanks.
@vikramdattu
for micro_speech
example, what is the purpose of having yes_micro_features_data.cc/h
and no_micro_features_data.cc/h
in the directory? Are they there for reference? They don't seem to be used - or am I missing something?
@gamename you are correct. Those were there from old days added for testing and are not used currently. You may ignore those.
@gamename you are correct. Those were there from old days added for testing and are not used currently. You may ignore those.
Thanks!
@vikramdattu
This concerns building the actual model. I am using a script here that is just a compilation of the steps outlined here.
Here is what my input dir with samples looks like:
tree ./samples
./samples
├── _background_noise_
│ ├── README.md
│ ├── doing_the_dishes.wav
│ ├── dude_miaowing.wav
│ ├── exercise_bike.wav
│ ├── pink_noise.wav
│ ├── running_tap.wav
│ └── white_noise.wav
└── meow
├── cat0001.wav
├── cat0002.wav
...
(there are 77 total cat .wav
files)
I'm confused about what needs to be in there. Do I need to add a silence
and unknown
subdir (with contents) as well?
Thanks -T
@vikramdattu
Another question. :)
Looking at this construct:
constexpr int kCategoryCount = 4;
constexpr const char* kCategoryLabels[kCategoryCount] = {
"silence",
"unknown",
"yes",
"no",
};```
...how do you know what the order of the labels ("silence", "unknown", etc) should be? How is that set?
Hello TennisSmith,
That completely depends on the model trained. It cannot be inferred from the model what categories are. Only the number of categories can be known from output tensor size.
Thanks., Vikram
On 14-Mar-2024, at 3:17 AM, Tennis Smith @.***> wrote:
[External: This email originated outside Espressif]
@vikramdattuhttps://github.com/vikramdattu
Another question. :)
Looking at this constructhttps://github.com/espressif/esp-tflite-micro/blob/61af88b7b30fda2078a7b52b2c1b600899a73e2e/examples/micro_speech/main/micro_model_settings.h#L31:
constexpr int kCategoryCount = 4; constexpr const char* kCategoryLabels[kCategoryCount] = { "silence", "unknown", "yes", "no", };```
...how do you know what the order of the labels ("silence", "unknown", etc) should be? How is that set?
— Reply to this email directly, view it on GitHubhttps://github.com/espressif/esp-tflite-micro/issues/74#issuecomment-1995919478, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABKBURYJORLN5LS6HFAZIN3YYDCN7AVCNFSM6AAAAABD3ASOXWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJVHEYTSNBXHA. You are receiving this because you were mentioned.Message ID: @.***>
That completely depends on the model trained. It cannot be inferred from the model what categories are. Only the number of categories can be known from output tensor size.
That's not quite what I am asking. :)
My question is this: How do I know the order of the labels as they are used in python after the model has been created?
Hi,
I'm using an esp32-s3-eye v2.2. It has 8MB each of flash and PSRAM. Is it possible to use
yamnet.tflite
on an esp32-s3-eye v2.2 for sound identification? Theyamnet.tflite
file is about 3.9M in size.The chip has an sd card slot, so I can use it to load the model file (i.e. no need to convert it to a
.cc
file withxxd
.Thoughts?