Deelvin / ck

This cross-platform tool helps to make software projects more portable, modular, reusable and reproducible across continuously changing software, hardware and data. It is being developed by the open MLCommons taskforce to reduce development, benchmarking, optimization and deployment time for ML and AI systems.
http://bit.ly/mlperf-edu-wg
Apache License 2.0
1 stars 0 forks source link

[Micro] Run model for `Keyword Spotting` on Arduino using `TFLite` #2

Open KJlaccHoeUM9l opened 1 year ago

KJlaccHoeUM9l commented 1 year ago

There are several categories in MLPerf where performance results can be submitted: https://github.com/mlcommons/tiny/tree/master/benchmark

For an initial dive into this submission for Microcontrollers, it is proposed to run a model for Keyword Spotting on Arduino: DC-CNN.

It is necessary to run this model and document the steps that needed to be taken to run it.

KJlaccHoeUM9l commented 1 year ago

@KJlaccHoeUM9l

Red-Caesar commented 1 year ago

My steps to solve this task:

  1. Found trained model from kws_model_data.cpp
  2. Have started to adapt the example from Tensorflow Lite - micro_speech with our new model. First, in micro_features_model.cpp changed g_model from basic to our one and changed g_model_len to 53936.
  3. After that, I had a problem, which I couldn't solve for a long time:

Image

So, how to solve it:

  1. Go to micro_speech.ino and del the code below:
    static tflite::MicroMutableOpResolver<4> micro_op_resolver;
    if (micro_op_resolver.AddDepthwiseConv2D() != kTfLiteOk) {
    return;
    }
    if (micro_op_resolver.AddFullyConnected() != kTfLiteOk) {
    return;
    }
    if (micro_op_resolver.AddSoftmax() != kTfLiteOk) {
    return;
    }
    if (micro_op_resolver.AddReshape() != kTfLiteOk) {
    return;
    }

    Use instead:

    static tflite::MicroMutableOpResolver<6> micro_op_resolver;
    if (micro_op_resolver.AddDepthwiseConv2D() != kTfLiteOk) {
    return;
    }
    if (micro_op_resolver.AddFullyConnected() != kTfLiteOk) {
    return;
    }
    if (micro_op_resolver.AddSoftmax() != kTfLiteOk) {
    return;
    }
    if (micro_op_resolver.AddReshape() != kTfLiteOk) {
    return;
    }
    if (micro_op_resolver.AddConv2D() != kTfLiteOk) {
    return;
    }
    if (micro_op_resolver.AddAveragePool2D() != kTfLiteOk) {
    return;
    }
  2. Del the clause below:
    if (
    (model_input->dims->size != 2) 
    || (model_input->dims->data[0] != 1) 
    || (model_input->dims->data[1] != (kFeatureSliceCount * kFeatureSliceSize)) 
    || (model_input->type != kTfLiteInt8)
      ) {
    MicroPrintf("Bad input tensor parameters in model");
    return;
    }

    Use instead:

    if (
    (model_input->dims->size != 4) 
    || (model_input->dims->data[0] != 1) 
    || (model_input->dims->data[1] != kFeatureSliceCount) 
    || (model_input->dims->data[2] !=  kFeatureSliceSize) 
    || (model_input->dims->data[3] != 1) 
    || (model_input->type != kTfLiteInt8)
      ) {
    MicroPrintf("Bad input tensor parameters in model");
    return;
    }
  3. Go to micro_features_micro_model_settings.h and change constants to this:
    
    constexpr int kFeatureSliceSize = 10;
    constexpr int kFeatureSliceCount = 49;

constexpr int kSilenceIndex = 10; constexpr int kUnknownIndex = 11; constexpr int kCategoryCount = 12;

4. Go to `micro_features_micro_model_settings.cpp` and change the old array like this:

const char* kCategoryLabels[kCategoryCount] = { "down", "go", "left", "no", "off", "on", "right", "stop", "up", "yes", "silence", "unknown" };


After that you will have a working model, but it's not working properly.

It is my next issue, I will describe in the next comment.
Red-Caesar commented 1 year ago

First of all, I was got three topics to explore:

  1. To find a place in the code, where input from the mic transform into the input tensor
  2. How the model uses input axes
  3. To find a place of postprocessing data

For myself I drew the next scheme:

Image

Honestly, I can't answer fully on each question, but this is my assumptions:

  1. Input from the mic we get all the time. The board waits signals from PDM MONO @ 16KHz system. In function PopulateFeatureData() in feature_provider.cpp we split audio on samples by time and transform it in proper way.
  2. I can't find it. The previous model uses 1x1960 tensor. The current model uses 49x10 tensor. But feature_buffer, which we use for inputing data to model, is an 1d array. So I don't get how to ask on this question, maybe I I misunderstand something.
  3. It is happeing in the function ProcessLatestResults() in recognize_commands.cpp. It uses the output tensor 1x12. Here we count scores for each prediction and choose the best score.

Also I found a test input data and tried to feed it to the model. And I think the problem is really in the input data @KJlaccHoeUM9l

Red-Caesar commented 1 year ago

I had a task to build boxplot for:

  1. unprepared data
  2. data after mfcc
  3. data from variable dat (eval_quantized_model.py)
  4. data from dat_q (eval_quantized_model.py)

Here my notebook: https://github.com/Red-Caesar/data-analysis-for-project/blob/main/data_analysis.ipynb

Red-Caesar commented 1 year ago

@KJlaccHoeUM9l

Red-Caesar commented 1 year ago

I forgot to add snapshot about data preparing in Arduino example. Maybe it will be useful for comparison too:

Image

Red-Caesar commented 1 year ago

Our example: https://github.com/Red-Caesar/MLPerf-Tiny/blob/master/benchmark/training/keyword_spotting/get_dataset.py Old example: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/microfrontend/lib/frontend.c

Red-Caesar commented 1 year ago

Input with unprepared data:

yes_input_preproc.txt

Image

frontend_input = data from file above input_size = 480 (duration_ms * (kAudioSampleFrequency / 1000), where duration_ms = kFeatureSliceDurationMs = 30, kAudioSampleFrequency = 16000 ) num_samples_read = 320 ( it's 66221 once in 12 times)

vvchernov commented 1 year ago

Raw signal preprocessing on MLPerf side (model_settings['feature_type'] == "mfcc"):

  1. cast to float32
  2. calculate max value and normalize on it
  3. pad end of the signal to desired samples(?) length and fill the tail by zeros
  4. create copy of the signal for foreground scaling
  5. add pad 2 from both sides of the foreground copy to and fill by zeros
  6. extract slice from foreground copy started from 2 with size desired samples. In this case it is the same as the signal
  7. the sliced foreground copy is processed by short-time fourier transform (STFT) with parameters: frame_length = model_settings['window_size_samples'], frame_step = model_settings['window_stride_samples'], window_fn = Hann
  8. Abs and legth are calculated from STFT output
  9. Calculate matrix for transformation spectrogram from STFT to mel-spectrogram
  10. Calculate mel-spectrogram by the matrix
  11. Correct mel-spectrogram shape by new number of bins
  12. Calculate stabilazed natural log from the mel-spectrogram: log(mel_spectrograms + 1e-6)
  13. Calculate mfcc from the logged mel-spectrogram and cut their number by model_settings['dct_coefficient_count']
  14. Reshape the final result in the corresponding way (model_settings['spectrogram_length'], model_settings['dct_coefficient_count'], 1)

Important: points from 4 to 6 can be skipped due to output from point 3 can be used on 7 one without any processing. Notes: the reference https://kite.com/python/docs/tensorflow.contrib.slim.rev_block_lib.contrib_framework_ops.audio_ops.mfcc is given here with description of default parameters used for mfcc calculation See also pipeline for mfcc calculation here: https://www.tensorflow.org/api_docs/python/tf/signal/mfccs_from_log_mel_spectrograms

vvchernov commented 1 year ago

cc @Red-Caesar @FlexingJelly @KJlaccHoeUM9l

Red-Caesar commented 1 year ago

Scheme of preprocessing steps in frontend.c:

Image

And the repo for any case: https://github.com/Red-Caesar/frontend-TensorFlow

Red-Caesar commented 1 year ago

Notes about tensorflow functions: https://quiver-brace-02a.notion.site/Tensorflow-18f6cf2d0c854254b7f89f822cf7bc4f