google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://ai.google.dev/edge/mediapipe
Apache License 2.0
27.62k stars 5.17k forks source link

internal : Failed to initialize session: %s% INTERNAL: CalculatorGraph::Run() failed #5724

Open Arya-Hari opened 6 days ago

Arya-Hari commented 6 days ago

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

No

OS Platform and Distribution

Linux Ubuntu 16.04

Mobile device if the issue happens on mobile device

Pixel 7a

Browser and version if the issue happens on browser

No response

Programming Language and version

Python

MediaPipe version

No response

Bazel version

No response

Solution

LLM Inference

Android Studio, NDK, SDK versions (if issue is related to building in Android environment)

No response

Xcode & Tulsi version (if issue is related to building for iOS)

No response

Describe the actual behavior

I created .tflite file using ai-edge-torch for Llama 3.2 1B model and created the Task Bundle as instructed in the documentation. After pushing the .task file to the device and modified the mediapipe example (which was for Gemma) for LLama. After running it, I get an error.

Describe the expected behaviour

I should be able to run inference without any issue.

Standalone code/steps you may have used to try to get what you need

# InferenceModel.kt after modification
package com.google.mediapipe.examples.llminference

import android.content.Context
import com.google.mediapipe.tasks.genai.llminference.LlmInference
import java.io.File
import kotlinx.coroutines.channels.BufferOverflow
import kotlinx.coroutines.flow.MutableSharedFlow
import kotlinx.coroutines.flow.SharedFlow
import kotlinx.coroutines.flow.asSharedFlow

class InferenceModel private constructor(context: Context) {
    private var llmInference: LlmInference

    private val modelExists: Boolean
        get() = File(MODEL_PATH).exists()

    private val _partialResults = MutableSharedFlow<Pair<String, Boolean>>(
        extraBufferCapacity = 1,
        onBufferOverflow = BufferOverflow.DROP_OLDEST
    )
    val partialResults: SharedFlow<Pair<String, Boolean>> = _partialResults.asSharedFlow()

    init {
        if (!modelExists) {
            throw IllegalArgumentException("Model not found at path: $MODEL_PATH")
        }

        val options = LlmInference.LlmInferenceOptions.builder()
            .setModelPath(MODEL_PATH)
            .setMaxTokens(1024)
            .setResultListener { partialResult, done ->
                _partialResults.tryEmit(partialResult to done)
            }
            .build()

        llmInference = LlmInference.createFromOptions(context, options)
    }

    fun generateResponseAsync(prompt: String) {
        // Add the gemma prompt prefix to trigger the response.
        val gemmaPrompt = prompt
        llmInference.generateResponseAsync(gemmaPrompt)
    }

    companion object {
        // NB: Make sure the filename is *unique* per model you use!
        // Weight caching is currently based on filename alone.
        private const val MODEL_PATH = "/data/local/tmp/llm/llama.task"
        private var instance: InferenceModel? = null

        fun getInstance(context: Context): InferenceModel {
            return if (instance != null) {
                instance!!
            } else {
                InferenceModel(context).also { instance = it }
            }
        }
    }
}

Other info / Complete Logs

Error -
internal: Failed to initialize session: %sINTERNAL: CalculatorGraph::Run() failed: Calculator::Open() for node "odml.infra.TfLitePrefillDecodeRunnerCalculator" failed; RET_CHECK failure (external/odml/odml/infra/genai/inference/utils/tflite_utils/tflite_llm_utils.cc:59) std::find_if(signature_keys.begin(), signature_keys.end(), [&](const std::string* key) { return *key == required_key; }) != signature_keys.end()
talumbau commented 3 days ago

The error indicates that your TF Lite model does not have the two required signatures: "prefill" and "decode". Thus, I think something went wrong in the step where you "created .tflite file using ai-edge-torch for Llama 3.2 1B". Our typical conversion scripts enforce the creation of those two signatures for a converted language model. Can you double check that the TF Lite file has those signatures and also post the conversion code you used?

kuaashish commented 3 days ago

Hi @talumbau,

Could you please review the information above, verify it according to the suggestions, and update us on the status?

Thank you!!

talumbau commented 2 days ago

Hi @kuaashish I think you meant to tag @Arya-Hari in your comment, since you are providing a link to my comment above where I identify the reason behind the error message. Please confirm if you meant your message to be for @Arya-Hari. Thanks!

Arya-Hari commented 2 days ago

Hi. So it seems the signature prefill and decode are missing from the file. The code I used for conversion was the example.py script given in AI Edge torch repository that I changed slightly for LLama. The changed script is below -

# Copyright 2024 The AI Edge Torch Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

import ai_edge_torch
from ai_edge_torch.generative.examples.llama import llama
from ai_edge_torch.generative.layers import kv_cache as kv_utils
from ai_edge_torch.generative.quantize import quant_recipes
from ai_edge_torch.generative.utilities import model_builder
import numpy as np
import torch

def main():
  # Build a PyTorch model as usual
  config = llama.get_fake_model_config()
  model = model_builder.DecoderOnlyModel(config).eval()
  idx = torch.from_numpy(np.array([[1, 2, 3, 4]]))
  tokens = torch.full((1, 10), 0, dtype=torch.int, device="cpu")
  tokens[0, :4] = idx
  input_pos = torch.arange(0, 10, dtype=torch.int)
  kv = kv_utils.KVCache.from_model_config(config)

  # Create a quantization recipe to be applied to the model
  quant_config = quant_recipes.full_int8_dynamic_recipe()
  print(quant_config)

  # Convert with quantization
  edge_model = ai_edge_torch.convert(
      model, (tokens, input_pos, kv), quant_config=quant_config
  )
  edge_model.export("/tmp/llama3_1b_quantized.tflite")

if __name__ == "__main__":
  main()
talumbau commented 1 day ago

Ah, I see. That is our fault for not updating that example file. Sorry about that. In order to use the LLM Inference API, please convert using the convert_*.py files in generative/examples/*/. For example, this convert_gemma2_to_tflite.py file:

https://github.com/google-ai-edge/ai-edge-torch/blob/main/ai_edge_torch/generative/examples/gemma/convert_gemma2_to_tflite.py

We will update the example.py file to be consistent with the usage of the multiple signatures.

The multi-signature method for conversion looks like this:

https://github.com/google-ai-edge/ai-edge-torch/blob/main/ai_edge_torch/generative/utilities/converter.py#L63

Should be pretty easy to update your code to use that method.

Arya-Hari commented 11 hours ago

Okay thank you!