argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
https://takeargmax.com/blog/whisperkit
MIT License
3.17k stars 267 forks source link

Experiencing crash on iPad8,8. #144

Closed iamyoungjo closed 3 months ago

iamyoungjo commented 4 months ago
          No solution yet? Experiencing the same issue on iPad8,8. Tried above "logProbThreshold: 0" tip but no luck.
[WhisperKit] Running on iPad8,8
[WhisperKit] Loading models from /var/mobile/Containers/Data/Application/826DAB14-7584-42C5-995D-DB610EC2F560/Documents/huggingface/models/argmaxinc/whisperkit-coreml/openai_whisper-base.en with prewarmMode: true
[WhisperKit] Loading feature extractor
[WhisperKit] Loaded feature extractor
[WhisperKit] Loading audio encoder

[WhisperKit] Loaded audio encoder
[WhisperKit] Loading text decoder
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
doUnloadModel:options:qos:error:: model=_ANEModel: { modelURL=file:///var/mobile/Containers/Data/Application/826DAB14-7584-42C5-995D-DB610EC2F560/Library/Caches/PG.AutoDoc/com.apple.e5rt.e5bundlecache/21E236/FC9E22833F2E559D5C91AFDFA46A92DD405A337FA2FEEEDE3D7B1FB0404809A9/2AFB581F639499133FE0F915E992E19CD88B045C19D172DD4F1DAA30EA54D09D.bundle/H11G.bundle/main/main_eir/ : sourceURL= (null) : key={"isegment":2,"inputs":{"denom_15_cast_fp16_ctx_tx_default__2":{"shape":[1,1,1,1,1]},"zero_mean_15_cast_fp16_ctx_tx_default__2":{"shape":[1,1,1,512,1]},"encoder_output_embeds_eir":{"shape":[1500,1,1,512,1]}},"outputs":{"query_11_cast_fp16":{"shape":[1,1,1,512,1]},"key_11_cast_fp16":{"shape":[1500,1,1,512,1]},"value_11_cast_fp16":{"shape":[1500,1,1,512,1]}}} : identifierSource=0 : cacheURLIdentifier=35CA4612A76C1BF9B086EADFED6E5004AD9F6DBC52F394148B6617D7FFCA8A4F_1E58A2423EF9EAAC837AC75C1D878F925524A1ACFF44DBDFB0EA68EEC3613D0B : string_id=0x00000000 : program=_ANEProgramForEvaluation: { programHandle=42126760470 : intermediateBufferHandle=42126814457 : queueDepth=127 } : state=3 : programHandle=42126760470 : intermediateBufferHandle=42126814457 : queueDepth=127 : attr={
    ANEFModelDescription =     {
        ANEFModelInput16KAlignmentArray =         (
            1,
            1,
            1
        );
        ANEFModelOutput16KAlignmentArray =         (
            1,
            1,
            1
        );
        ANEFModelProcedures =         (
                        {
                ANEFModelInputSymbolIndexArray =                 (
                    0,
                    1,
                    2
                );
                ANEFModelOutputSymbolIndexArray =                 (
                    0,
                    1,
                    2
                );
                ANEFModelProcedureID = 0;
            }
        );
        kANEFModelInputSymbolsArrayKey =         (
            242597c4eae0168622c8ad87213eca2b,
            cd370f0cd20ef8258ccb4fd6beb1d568,
            "encoder_output_embeds_eir"
        );
        kANEFModelOutputSymbolsArrayKey =         (
            "key_11_cast_fp16@output",
            "query_11_cast_fp16@output",
            "value_11_cast_fp16@output"
        );
        kANEFModelProcedureNameToIDMapKey =         {
            "net_2" = 0;
        };
    };
    NetworkStatusList =     (
                {
            LiveInputList =             (
                                {
                    BatchStride = 64;
                    Batches = 1;
                    Channels = 1;
                    Depth = 1;
                    DepthStride = 64;
                    Height = 1;
                    Interleave = 1;
                    Name = 242597c4eae0168622c8ad87213eca2b;
                    PlaneCount = 1;
                    PlaneStride = 64;
                    RowStride = 64;
                    Symbol = 242597c4eae0168622c8ad87213eca2b;
                    Type = Float16;
                    Width = 1;
                },
                                {
                    BatchStride = 32768;
                    Batches = 1;
                    Channels = 512;
                    Depth = 1;
                    DepthStride = 32768;
                    Height = 1;
                    Interleave = 1;
                    Name = cd370f0cd20ef8258ccb4fd6beb1d568;
                    PlaneCount = 512;
                    PlaneStride = 64;
                    RowStride = 64;
                    Symbol = cd370f0cd20ef8258ccb4fd6beb1d568;
                    Type = Float16;
                    Width = 1;
                },
                                {
                    BatchStride = 1540096;
                    Batches = 1;
                    Channels = 512;
                    Depth = 1;
                    DepthStride = 1540096;
                    Height = 1;
                    Interleave = 1;
                    Name = "encoder_output_embeds_eir";
                    PlaneCount = 512;
                    PlaneStride = 3008;
                    RowStride = 3008;
                    Symbol = "encoder_output_embeds_eir";
                    Type = Float16;
                    Width = 1500;
                }
            );
            LiveOutputList =             (
                                {
                    BatchStride = 1540096;
                    Batches = 1;
                    Channels = 512;
                    Depth = 1;
                    DepthStride = 1540096;
                    Height = 1;
                    Interleave = 1;
                    Name = "key_11_cast_fp16@output";
                    PlaneCount = 512;
                    PlaneStride = 3008;
                    RowStride = 3008;
                    Symbol = "key_11_cast_fp16@output";
                    Type = Float16;
                    Width = 1500;
                },
                                {
                    BatchStride = 32768;
                    Batches = 1;
                    Channels = 512;
                    Depth = 1;
                    DepthStride = 32768;
                    Height = 1;
                    Interleave = 1;
                    Name = "query_11_cast_fp16@output";
                    PlaneCount = 512;
                    PlaneStride = 64;
                    RowStride = 64;
                    Symbol = "query_11_cast_fp16@output";
                    Type = Float16;
                    Width = 1;
                },
                                {
                    BatchStride = 1540096;
                    Batches = 1;
                    Channels = 512;
                    Depth = 1;
                    DepthStride = 1540096;
                    Height = 1;
                    Interleave = 1;
                    Name = "value_11_cast_fp16@output";
                    PlaneCount = 512;
                    PlaneStride = 3008;
                    RowStride = 3008;
                    Symbol = "value_11_cast_fp16@output";
                    Type = Float16;
                    Width = 1500;
                }
            );
            Name = "net_2";
        }
    );
} : perfStatsMask=0}  was not loaded by the client.
doUnloadModel:options:qos:error:: model=_ANEModel: { modelURL=file:///var/mobile/Containers/Data/Application/826DAB14-7584-42C5-995D-DB610EC2F560/Library/Caches/PG.AutoDoc/com.apple.e5rt.e5bundlecache/21E236/FC9E22833F2E559D5C91AFDFA46A92DD405A337FA2FEEEDE3D7B1FB0404809A9/2AFB581F639499133FE0F915E992E19CD88B045C19D172DD4F1DAA30EA54D09D.bundle/H11G.bundle/main/main_eir/ : sourceURL= (null) : key={"isegment":1,"inputs":{"zero_mean_9_cast_fp16_ctx_tx_default__1":{"shape":[1,1,1,512,1]},"encoder_output_embeds_eir":{"shape":[1500,1,1,512,1]},"denom_9_cast_fp16_ctx_tx_default__1":{"shape":[1,1,1,1,1]}},"outputs":{"key_7_cast_fp16":{"shape":[1500,1,1,512,1]},"value_7_cast_fp16":{"shape":[1500,1,1,512,1]},"query_7_cast_fp16":{"shape":[1,1,1,512,1]}}} : identifierSource=0 : cacheURLIdentifier=35CA4612A76C1BF9B086EADFED6E5004AD9F6DBC52F394148B6617D7FFCA8A4F_DB53ACF841ED8B23A05AE898F9B7413FCEA9552468D5A4F5E21F5093C216CABC : string_id=0x00000000 : program=_ANEProgramForEvaluation: { programHandle=42119499582 : intermediateBufferHandle=42119552918 : queueDepth=127 } : state=3 : programHandle=42119499582 : intermediateBufferHandle=42119552918 : queueDepth=127 : attr={
    ANEFModelDescription =     {
        ANEFModelInput16KAlignmentArray =         (
            1,
            1,
            1
        );
        ANEFModelOutput16KAlignmentArray =         (
            1,
            1,
            1
        );
        ANEFModelProcedures =         (
                        {
                ANEFModelInputSymbolIndexArray =                 (
                    0,
                    1,
                    2
                );
                ANEFModelOutputSymbolIndexArray =                 (
                    0,
                    1,
                    2
                );
                ANEFModelProcedureID = 0;
            }
        );
        kANEFModelInputSymbolsArrayKey =         (
            055d8d371a86daad353b3fde75fdd997,
            d9a0409d949391bc8fa8e96671e9c79b,
            "encoder_output_embeds_eir"
        );
        kANEFModelOutputSymbolsArrayKey =         (
            "key_7_cast_fp16@output",
            "query_7_cast_fp16@output",
            "value_7_cast_fp16@output"
        );
        kANEFModelProcedureNameToIDMapKey =         {
            "net_1" = 0;
        };
    };
    NetworkStatusList =     (
                {
            LiveInputList =             (
                                {
                    BatchStride = 64;
                    Batches = 1;
                    Channels = 1;
                    Depth = 1;
                    DepthStride = 64;
                    Height = 1;
                    Interleave = 1;
                    Name = 055d8d371a86daad353b3fde75fdd997;
                    PlaneCount = 1;
                    PlaneStride = 64;
                    RowStride = 64;
                    Symbol = 055d8d371a86daad353b3fde75fdd997;
                    Type = Float16;
                    Width = 1;
                },
                                {
                    BatchStride = 32768;
                    Batches = 1;
                    Channels = 512;
                    Depth = 1;
                    DepthStride = 32768;
                    Height = 1;
                    Interleave = 1;
                    Name = d9a0409d949391bc8fa8e96671e9c79b;
                    PlaneCount = 512;
                    PlaneStride = 64;
                    RowStride = 64;
                    Symbol = d9a0409d949391bc8fa8e96671e9c79b;
                    Type = Float16;
                    Width = 1;
                },
                                {
                    BatchStride = 1540096;
                    Batches = 1;
                    Channels = 512;
                    Depth = 1;
                    DepthStride = 1540096;
                    Height = 1;
                    Interleave = 1;
                    Name = "encoder_output_embeds_eir";
                    PlaneCount = 512;
                    PlaneStride = 3008;
                    RowStride = 3008;
                    Symbol = "encoder_output_embeds_eir";
                    Type = Float16;
                    Width = 1500;
                }
            );
            LiveOutputList =             (
                                {
                    BatchStride = 1540096;
                    Batches = 1;
                    Channels = 512;
                    Depth = 1;
                    DepthStride = 1540096;
                    Height = 1;
                    Interleave = 1;
                    Name = "key_7_cast_fp16@output";
                    PlaneCount = 512;
                    PlaneStride = 3008;
                    RowStride = 3008;
                    Symbol = "key_7_cast_fp16@output";
                    Type = Float16;
                    Width = 1500;
                },
                                {
                    BatchStride = 32768;
                    Batches = 1;
                    Channels = 512;
                    Depth = 1;
                    DepthStride = 32768;
                    Height = 1;
                    Interleave = 1;
                    Name = "query_7_cast_fp16@output";
                    PlaneCount = 512;
                    PlaneStride = 64;
                    RowStride = 64;
                    Symbol = "query_7_cast_fp16@output";
                    Type = Float16;
                    Width = 1;
                },
                                {
                    BatchStride = 1540096;
                    Batches = 1;
                    Channels = 512;
                    Depth = 1;
                    DepthStride = 1540096;
                    Height = 1;
                    Interleave = 1;
                    Name = "value_7_cast_fp16@output";
                    PlaneCount = 512;
                    PlaneStride = 3008;
                    RowStride = 3008;
                    Symbol = "value_7_cast_fp16@output";
                    Type = Float16;
                    Width = 1500;
                }
            );
            Name = "net_1";
        }
    );
} : perfStatsMask=0}  was not loaded by the client.
[WhisperKit] Loaded text decoder
[WhisperKit] Loading models from /var/mobile/Containers/Data/Application/826DAB14-7584-42C5-995D-DB610EC2F560/Documents/huggingface/models/argmaxinc/whisperkit-coreml/openai_whisper-base.en with prewarmMode: false
[WhisperKit] Loading feature extractor
[WhisperKit] Loaded feature extractor
[WhisperKit] Loading audio encoder
[WhisperKit] Loaded audio encoder
[WhisperKit] Loading text decoder
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
[WhisperKit] Loaded text decoder
[WhisperKit] Loading tokenizer for base.en
[WhisperKit] Loaded tokenizer
[WhisperKit] Loaded models for whisper size: base.en

[WhisperKit] Current audio size: 32000 samples, most recent buffer: 1600 samples, most recent energy: (0.063490346, 0.00058010797, 0.0021451442, 1.3345561e-07)

[WhisperKit] Current audio size: 64000 samples, most recent buffer: 1600 samples, most recent energy: (0.038249217, 0.0006892931, 0.0022904596, 7.053459e-07)
[WhisperKit] Decoder init time: 0.012899041175842285
[WhisperKit] Prefill time: 0.0006909370422363281
[WhisperKit] Prefill prompt: ["<|startoftranscript|>", "<|0.00|>"]
[WhisperKit] Decoding Seek: 0
[WhisperKit] Current audio size: 96000 samples, most recent buffer: 1600 samples, most recent energy: (0.0035468133, 0.0005462394, 0.0016842313, 7.582712e-07)

[WhisperKit] Decoding 0.0s - 5.1s
[WhisperKit] Decoding with tempeartures [0.0, 0.2, 0.4, 0.5996, 0.8, 1.0]
[WhisperKit] Decoding Temperature: 0.0
[WhisperKit] Running main loop for a maximum of 223 iterations, starting at index 0
[WhisperKit] Forcing token 50257 at index 0 from initial prompt
[WhisperKit] --------------- DECODER INPUTS DEBUG ---------------
[WhisperKit] Cache Length:  0 Input Token: 50257
[WhisperKit] Key Cache | Val Cache | Align Cache | Update Mask | Decoder Mask | Position
[WhisperKit]  0.000000 |  0.000000 |  0.000000 |           1 |            0 | 0
[WhisperKit]  0.000000 |  0.000000 |  0.000000 |           0 |       -10000 | 1
[WhisperKit]  0.000000 |  0.000000 |  0.000000 |           0 |       -10000 | 2
[WhisperKit]  0.000000 |  0.000000 |  0.000000 |           0 |       -10000 | 3
[WhisperKit] Current audio size: 128000 samples, most recent buffer: 1600 samples, most recent energy: (0.009261748, 0.00054546853, 0.0017254573, 3.4167897e-08)
[WhisperKit] tokenIndex: 0, token: 50361, word: <|nocaptions|>
[WhisperKit] Forcing token 50363 at index 1 from initial prompt
[WhisperKit] --------------- DECODER INPUTS DEBUG ---------------
[WhisperKit] Cache Length:  1 Input Token: 50363
[WhisperKit] Key Cache | Val Cache | Align Cache | Update Mask | Decoder Mask | Position
[WhisperKit]  0.476074 | -0.014931 |  0.000000 |           0 |            0 | 0
[WhisperKit]  0.000000 |  0.000000 |  0.024368 |           1 |            0 | 1
[WhisperKit]  0.000000 |  0.000000 |  0.000000 |           0 |       -10000 | 2
[WhisperKit]  0.000000 |  0.000000 |  0.000000 |           0 |       -10000 | 3
[WhisperKit] tokenIndex: 1, token: 357, word:  (
[WhisperKit] Early stopping
[WhisperKit] Fallback #1.0 (logProbThreshold)
[WhisperKit] Decoding Temperature: 0.2
[WhisperKit] Running main loop for a maximum of 223 iterations, starting at index 0
[WhisperKit] Forcing token 50257 at index 0 from initial prompt
[WhisperKit] --------------- DECODER INPUTS DEBUG ---------------
[WhisperKit] Cache Length:  0 Input Token: 50257
[WhisperKit] Key Cache | Val Cache | Align Cache | Update Mask | Decoder Mask | Position
[WhisperKit]  0.476074 | -0.014931 |  0.000000 |           1 |            0 | 0
[WhisperKit]  0.482666 |  0.555664 |  0.024368 |           0 |       -10000 | 1
[WhisperKit]  0.000000 |  0.000000 |  0.032623 |           0 |       -10000 | 2
[WhisperKit]  0.000000 |  0.000000 |  0.000000 |           0 |       -10000 | 3

Originally posted by @iamyoungjo in https://github.com/argmaxinc/WhisperKit/issues/10#issuecomment-2126574781

atiorh commented 3 months ago

https://github.com/argmaxinc/WhisperKit/issues/10#issuecomment-2185589116

atiorh commented 3 months ago

Please feel free to reopen the issue if the comment I posted above doesn't resolve your issue!