espressif / esp-tflite-micro

TensorFlow Lite Micro for Espressif Chipsets
Apache License 2.0
395 stars 85 forks source link

LoadProhibited error with Strided Slice Operation during Tensor Allocation (TFMIC-8) #53

Open areabow opened 1 year ago

areabow commented 1 year ago

Good day, I have the following model that I am using for inference on an esp32. The model is a minimally "forward only" definition in tensorflow for an early time-series classification model that I have trained in PyTorch. Hence in tensorflow, this model is only suitable for inference. I have been able to successfully transfer the weights from torch to tensorflow and verified on my validation set that all is correct i.t.o predictions. Further, I'm able to convert this model to tf lite without issue and confirmed the TFLite converted model using the Python inference API.

class Controller(tf.keras.Model):
    def __init__(self, noutputs):
        super(Controller, self).__init__()
        self.fc = layers.Dense(noutputs, activation = 'sigmoid')

    @tf.function
    def call(self, h):
        # Predict one probability per class
        h = layers.Flatten()(h)
        probs = self.fc(h)

        # mimic bernoulli distribution
        action = (tf.ones(shape = tf.shape(probs), dtype=tf.float32, name='action_sample') * 0.5) < probs
        action = tf.cast(action, tf.float32)
        return action

# --- Keras Inferencing Model ---
class RHC(tf.keras.Model):
    def __init__(self, nhid, nclasses, n_states):
        super(RHC, self).__init__()
        self.nhid = nhid
        self._N_CLASSES = nclasses
        self.T = tf.constant(n_states, dtype=tf.int32, name = 'time_steps')

        # --- Submodules ---
        self.Controller = Controller(nclasses)

        self.RNN = layers.LSTM(nhid, 
                               return_sequences=True, 
                               return_state=True,
                               unit_forget_bias=False,
                               unroll = True)
        self.out = layers.Dense(nclasses, activation='sigmoid')

    @tf.function()
    def call(self, X, training = False):
        B = tf.shape(X, name = 'shape_batch')[0]
        y_bar = tf.zeros((B, 1, self._N_CLASSES), name = 'y_bar')# Indicator vector
        hidden = [tf.zeros([B,self.nhid], name = 'h_0'),
                  tf.zeros([B,self.nhid], name = 'c_0')]
        predictions = tf.zeros((B, 1, self._N_CLASSES), name='predictions') # Record predicted values

        # --- for each timestep, select a set of actions ---
        t = 0
        while t < self.T:
            RNN_in = tf.concat((tf.expand_dims(tf.gather(X, t, axis=1), axis=1), y_bar), axis=2, name = "concat_rnn_in")
            state, h_t, c_t = self.RNN(inputs = (RNN_in), initial_state = hidden)
            hidden = [h_t, c_t]
            flatten_state = layers.Flatten()(state)
            y_hat = self.out(flatten_state)
            y_hat = tf.expand_dims(y_hat, axis = 1, name = 'expand_y_hat')
            time = tf.ones(shape = (B, 1, 1), dtype=tf.float32, name = 'time') * tf.cast(t, dtype=tf.float32)  # collect timestep
            c_in = tf.concat((state, y_hat, time), axis=2, name='controller_input')
            a_t = self.Controller(c_in) #minimal controller
            a_t = tf.expand_dims(a_t, axis = 1, name = 'expand_actions')
            predictions = tf.where((a_t == 1) & (predictions == 0), y_hat, predictions)
            y_bar = tf.where((a_t == 1) & (y_bar == 0), tf.ones_like(y_bar), y_bar)
            t += 1
        y_hat = tf.squeeze(tf.where(predictions == 0.0, y_hat, predictions))  # If it never stopped to predict, use final prediction
        return y_hat

I have implemented a minimal example to run inference on the ESP, however I am having some trouble with the tensor allocation process:

#include <stdio.h>
#include <string.h>
#include "sdkconfig.h"
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_log.h"
#include "rhc.h"
#include "tensorflow/lite/core/c/common.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/micro_log.h"
#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
#include "tensorflow/lite/micro/system_setup.h"
#include "tensorflow/lite/schema/schema_generated.h"
#include "model_settings.h"

//-----Globals, used for tfLITE------//

const tflite::Model* model = nullptr;
TfLiteTensor* model_input = nullptr;
tflite::ErrorReporter* error_reporter;
tflite::MicroInterpreter* interpreter = nullptr;

uint8_t tensor_arena[kTensorArenaSize];

namespace {
    using RHCOpResolver = tflite::MicroMutableOpResolver<24>;
    TfLiteStatus RegisterOps(RHCOpResolver& op_resolver)
    {
        if (op_resolver.AddAdd() != kTfLiteOk) {return kTfLiteError;}
        if (op_resolver.AddCast() != kTfLiteOk) {return kTfLiteError;}
        if (op_resolver.AddConcatenation() != kTfLiteOk) {return kTfLiteError;}
        if (op_resolver.AddEqual() != kTfLiteOk) {return kTfLiteError;}
        if (op_resolver.AddExpandDims() != kTfLiteOk) {return kTfLiteError;}
        if (op_resolver.AddFill() != kTfLiteOk) {return kTfLiteError;}
        if (op_resolver.AddFullyConnected() != kTfLiteOk) {return kTfLiteError;}
        if (op_resolver.AddGather() != kTfLiteOk) {return kTfLiteError;}
        if (op_resolver.AddGreater() != kTfLiteOk) {return kTfLiteError;}
        if (op_resolver.AddLess() != kTfLiteOk) {return kTfLiteError;}
        if (op_resolver.AddLogicalAnd() != kTfLiteOk) {return kTfLiteError;}
        if (op_resolver.AddLogistic() != kTfLiteOk) {return kTfLiteError;}
        if (op_resolver.AddMul() != kTfLiteOk) {return kTfLiteError;}
        if (op_resolver.AddPack() != kTfLiteOk) {return kTfLiteError;}
        if (op_resolver.AddReshape() != kTfLiteOk) {return kTfLiteError;}
        if (op_resolver.AddSelectV2() != kTfLiteOk) {return kTfLiteError;}
        if (op_resolver.AddShape() != kTfLiteOk) {return kTfLiteError;}
        if (op_resolver.AddSplit() != kTfLiteOk) {return kTfLiteError;}
        if (op_resolver.AddSqueeze() != kTfLiteOk) {return kTfLiteError;}
        if (op_resolver.AddStridedSlice() != kTfLiteOk) {return kTfLiteError;}
        if (op_resolver.AddTanh() != kTfLiteOk) {return kTfLiteError;}
        if (op_resolver.AddTranspose() != kTfLiteOk) {return kTfLiteError;}
        if (op_resolver.AddUnpack() != kTfLiteOk) {return kTfLiteError;}
        if (op_resolver.AddWhile() != kTfLiteOk) {return kTfLiteError;}
        return kTfLiteOk;
    }
}  // namespace

extern "C" void app_main(void)
{
    //-------------TF Config---------------//
    model = tflite::GetModel(rhc);
    if (model->version() != TFLITE_SCHEMA_VERSION) 
    {
        TF_LITE_REPORT_ERROR(error_reporter, 
                             "Model provided is schema version %d not equal to supported version %d.", 
                             model->version(), 
                             TFLITE_SCHEMA_VERSION);
        return;
    }
    // pull in all required operations
    RHCOpResolver micro_op_resolver;
    if (RegisterOps(micro_op_resolver) != kTfLiteOk) 
    {
        TF_LITE_REPORT_ERROR(error_reporter, "Registering ops failed");
        return;
    }

    // Build an interpreter to run the model with.
    static tflite::MicroInterpreter static_interpreter(model, 
                                                       micro_op_resolver, 
                                                       tensor_arena, 
                                                       kTensorArenaSize);
    interpreter = &static_interpreter;
    // Allocate memory from the tensor_arena for the model's tensors.

    if (interpreter->AllocateTensors() != kTfLiteOk) 
    {
        TF_LITE_REPORT_ERROR(error_reporter,"AllocateTensors() failed");
        return;
    }

    while (1) {vTaskDelay(1);}
}

When I attempt to implement the model on the ESP32, I run into the following issue when attempting to allocate the tensors on idf monitor:

Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.

With backtrace:

Core  0 register dump:
PC      : 0x400e4532  PS      : 0x00060e30  A0      : 0x800e46a8  A1      : 0x3ffb8460  
0x400e4532: tflite::(anonymous namespace)::BuildStridedSliceParams(tflite::(anonymous namespace)::StridedSliceContext*) at /Users/alvinreabow/projects/gm-classifier-fw/components/tflite-lib/tensorflow/lite/micro/kernels/strided_slice.cc:79
A2      : 0x3ffb84e4  A3      : 0x3ffb8538  A4      : 0x3ffb89a8  A5      : 0x00000004  
A6      : 0x00000000  A7      : 0x00000001  A8      : 0x3ffb84e4  A9      : 0x00000003  
A10     : 0x00000000  A11     : 0x3f404acc  A12     : 0x00000000  A13     : 0x3ffb8538  
A14     : 0x3ffb64cc  A15     : 0x00000005  SAR     : 0x0000001a  EXCCAUSE: 0x0000001c  
EXCVADDR: 0x00000000  LBEG    : 0x4000c46c  LEND    : 0x4000c477  LCOUNT  : 0x00000000  
Backtrace: 0x400e452f:0x3ffb8460 0x400e46a5:0x3ffb8480 0x400f0279:0x3ffb85d0 0x400d8460:0x3ffb8600 0x400d804b:0x3ffb8630 0x40106d27:0x3ffb8a20 0x4008829d:0x3ffb8a50
0x400e452f: tflite::(anonymous namespace)::BuildStridedSliceParams(tflite::(anonymous namespace)::StridedSliceContext*) at /projects/gm-classifier-fw/components/tflite-lib/tensorflow/lite/micro/kernels/strided_slice.cc:77
0x400e46a5: tflite::(anonymous namespace)::Prepare(TfLiteContext*, TfLiteNode*) at projects/gm-classifier-fw/components/tflite-lib/tensorflow/lite/micro/kernels/strided_slice.cc:143 (discriminator 2)
0x400f0279: tflite::MicroGraph::PrepareSubgraphs() at /projects/gm-classifier-fw/components/tflite-lib/tensorflow/lite/micro/micro_graph.cc:102
0x400d8460: tflite::MicroInterpreter::AllocateTensors() at projects/gm-classifier-fw/components/tflite-lib/tensorflow/lite/micro/micro_interpreter.cc:206 (discriminator 2)
0x400d804b: app_main at /projects/gm-classifier-fw/main/main.cpp:137
0x40106d27: main_task at /esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/port_common.c:131 (discriminator 2)
0x4008829d: vPortTaskWrapper at /esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:154

The list of Ops in my model are (export from tensorflow visualize.py):

ADD, CAST, CONCATENATION, EQUAL, EXPAND_DIMS, FILL, FULLY_CONNECTED, GATHER, GREATER, LESS, LOGICAL_AND, LOGISTIC, MUL, PACK, RESHAPE, SELECT_V2, SHAPE, SPLIT, SQUEEZE, STRIDED_SLICE, TANH, TRANSPOSE, UNPACK, WHILE

I have tried increasing the main task stack and have had no luck there either. Any assistance is appreciated.

SaketNer commented 1 year ago

Hey i am facing a similar issue. Did you get any resolution to this issue?

NavodPeiris commented 10 months ago

@areabow @SaketNer You can allocate memory on PSRAM for your TensorArena. This should work.

if (tensor_arena == NULL) {
    //allocate memory for TensorArena on PSRAM
    tensor_arena = (uint8_t *) ps_malloc(kTensorArenaSize);
  }

also checkout this Github repo where I used PSRAM for my tensor_arena: https://github.com/Navodplayer1/ESP32_PSRAM_Person_Detection