Closed iamyoungjo closed 3 months ago
No solution yet? Experiencing the same issue on iPad8,8. Tried above "logProbThreshold: 0" tip but no luck.
[WhisperKit] Running on iPad8,8 [WhisperKit] Loading models from /var/mobile/Containers/Data/Application/826DAB14-7584-42C5-995D-DB610EC2F560/Documents/huggingface/models/argmaxinc/whisperkit-coreml/openai_whisper-base.en with prewarmMode: true [WhisperKit] Loading feature extractor [WhisperKit] Loaded feature extractor [WhisperKit] Loading audio encoder [WhisperKit] Loaded audio encoder [WhisperKit] Loading text decoder Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes doUnloadModel:options:qos:error:: model=_ANEModel: { modelURL=file:///var/mobile/Containers/Data/Application/826DAB14-7584-42C5-995D-DB610EC2F560/Library/Caches/PG.AutoDoc/com.apple.e5rt.e5bundlecache/21E236/FC9E22833F2E559D5C91AFDFA46A92DD405A337FA2FEEEDE3D7B1FB0404809A9/2AFB581F639499133FE0F915E992E19CD88B045C19D172DD4F1DAA30EA54D09D.bundle/H11G.bundle/main/main_eir/ : sourceURL= (null) : key={"isegment":2,"inputs":{"denom_15_cast_fp16_ctx_tx_default__2":{"shape":[1,1,1,1,1]},"zero_mean_15_cast_fp16_ctx_tx_default__2":{"shape":[1,1,1,512,1]},"encoder_output_embeds_eir":{"shape":[1500,1,1,512,1]}},"outputs":{"query_11_cast_fp16":{"shape":[1,1,1,512,1]},"key_11_cast_fp16":{"shape":[1500,1,1,512,1]},"value_11_cast_fp16":{"shape":[1500,1,1,512,1]}}} : identifierSource=0 : cacheURLIdentifier=35CA4612A76C1BF9B086EADFED6E5004AD9F6DBC52F394148B6617D7FFCA8A4F_1E58A2423EF9EAAC837AC75C1D878F925524A1ACFF44DBDFB0EA68EEC3613D0B : string_id=0x00000000 : program=_ANEProgramForEvaluation: { programHandle=42126760470 : intermediateBufferHandle=42126814457 : queueDepth=127 } : state=3 : programHandle=42126760470 : intermediateBufferHandle=42126814457 : queueDepth=127 : attr={ ANEFModelDescription = { ANEFModelInput16KAlignmentArray = ( 1, 1, 1 ); ANEFModelOutput16KAlignmentArray = ( 1, 1, 1 ); ANEFModelProcedures = ( { ANEFModelInputSymbolIndexArray = ( 0, 1, 2 ); ANEFModelOutputSymbolIndexArray = ( 0, 1, 2 ); ANEFModelProcedureID = 0; } ); kANEFModelInputSymbolsArrayKey = ( 242597c4eae0168622c8ad87213eca2b, cd370f0cd20ef8258ccb4fd6beb1d568, "encoder_output_embeds_eir" ); kANEFModelOutputSymbolsArrayKey = ( "key_11_cast_fp16@output", "query_11_cast_fp16@output", "value_11_cast_fp16@output" ); kANEFModelProcedureNameToIDMapKey = { "net_2" = 0; }; }; NetworkStatusList = ( { LiveInputList = ( { BatchStride = 64; Batches = 1; Channels = 1; Depth = 1; DepthStride = 64; Height = 1; Interleave = 1; Name = 242597c4eae0168622c8ad87213eca2b; PlaneCount = 1; PlaneStride = 64; RowStride = 64; Symbol = 242597c4eae0168622c8ad87213eca2b; Type = Float16; Width = 1; }, { BatchStride = 32768; Batches = 1; Channels = 512; Depth = 1; DepthStride = 32768; Height = 1; Interleave = 1; Name = cd370f0cd20ef8258ccb4fd6beb1d568; PlaneCount = 512; PlaneStride = 64; RowStride = 64; Symbol = cd370f0cd20ef8258ccb4fd6beb1d568; Type = Float16; Width = 1; }, { BatchStride = 1540096; Batches = 1; Channels = 512; Depth = 1; DepthStride = 1540096; Height = 1; Interleave = 1; Name = "encoder_output_embeds_eir"; PlaneCount = 512; PlaneStride = 3008; RowStride = 3008; Symbol = "encoder_output_embeds_eir"; Type = Float16; Width = 1500; } ); LiveOutputList = ( { BatchStride = 1540096; Batches = 1; Channels = 512; Depth = 1; DepthStride = 1540096; Height = 1; Interleave = 1; Name = "key_11_cast_fp16@output"; PlaneCount = 512; PlaneStride = 3008; RowStride = 3008; Symbol = "key_11_cast_fp16@output"; Type = Float16; Width = 1500; }, { BatchStride = 32768; Batches = 1; Channels = 512; Depth = 1; DepthStride = 32768; Height = 1; Interleave = 1; Name = "query_11_cast_fp16@output"; PlaneCount = 512; PlaneStride = 64; RowStride = 64; Symbol = "query_11_cast_fp16@output"; Type = Float16; Width = 1; }, { BatchStride = 1540096; Batches = 1; Channels = 512; Depth = 1; DepthStride = 1540096; Height = 1; Interleave = 1; Name = "value_11_cast_fp16@output"; PlaneCount = 512; PlaneStride = 3008; RowStride = 3008; Symbol = "value_11_cast_fp16@output"; Type = Float16; Width = 1500; } ); Name = "net_2"; } ); } : perfStatsMask=0} was not loaded by the client. doUnloadModel:options:qos:error:: model=_ANEModel: { modelURL=file:///var/mobile/Containers/Data/Application/826DAB14-7584-42C5-995D-DB610EC2F560/Library/Caches/PG.AutoDoc/com.apple.e5rt.e5bundlecache/21E236/FC9E22833F2E559D5C91AFDFA46A92DD405A337FA2FEEEDE3D7B1FB0404809A9/2AFB581F639499133FE0F915E992E19CD88B045C19D172DD4F1DAA30EA54D09D.bundle/H11G.bundle/main/main_eir/ : sourceURL= (null) : key={"isegment":1,"inputs":{"zero_mean_9_cast_fp16_ctx_tx_default__1":{"shape":[1,1,1,512,1]},"encoder_output_embeds_eir":{"shape":[1500,1,1,512,1]},"denom_9_cast_fp16_ctx_tx_default__1":{"shape":[1,1,1,1,1]}},"outputs":{"key_7_cast_fp16":{"shape":[1500,1,1,512,1]},"value_7_cast_fp16":{"shape":[1500,1,1,512,1]},"query_7_cast_fp16":{"shape":[1,1,1,512,1]}}} : identifierSource=0 : cacheURLIdentifier=35CA4612A76C1BF9B086EADFED6E5004AD9F6DBC52F394148B6617D7FFCA8A4F_DB53ACF841ED8B23A05AE898F9B7413FCEA9552468D5A4F5E21F5093C216CABC : string_id=0x00000000 : program=_ANEProgramForEvaluation: { programHandle=42119499582 : intermediateBufferHandle=42119552918 : queueDepth=127 } : state=3 : programHandle=42119499582 : intermediateBufferHandle=42119552918 : queueDepth=127 : attr={ ANEFModelDescription = { ANEFModelInput16KAlignmentArray = ( 1, 1, 1 ); ANEFModelOutput16KAlignmentArray = ( 1, 1, 1 ); ANEFModelProcedures = ( { ANEFModelInputSymbolIndexArray = ( 0, 1, 2 ); ANEFModelOutputSymbolIndexArray = ( 0, 1, 2 ); ANEFModelProcedureID = 0; } ); kANEFModelInputSymbolsArrayKey = ( 055d8d371a86daad353b3fde75fdd997, d9a0409d949391bc8fa8e96671e9c79b, "encoder_output_embeds_eir" ); kANEFModelOutputSymbolsArrayKey = ( "key_7_cast_fp16@output", "query_7_cast_fp16@output", "value_7_cast_fp16@output" ); kANEFModelProcedureNameToIDMapKey = { "net_1" = 0; }; }; NetworkStatusList = ( { LiveInputList = ( { BatchStride = 64; Batches = 1; Channels = 1; Depth = 1; DepthStride = 64; Height = 1; Interleave = 1; Name = 055d8d371a86daad353b3fde75fdd997; PlaneCount = 1; PlaneStride = 64; RowStride = 64; Symbol = 055d8d371a86daad353b3fde75fdd997; Type = Float16; Width = 1; }, { BatchStride = 32768; Batches = 1; Channels = 512; Depth = 1; DepthStride = 32768; Height = 1; Interleave = 1; Name = d9a0409d949391bc8fa8e96671e9c79b; PlaneCount = 512; PlaneStride = 64; RowStride = 64; Symbol = d9a0409d949391bc8fa8e96671e9c79b; Type = Float16; Width = 1; }, { BatchStride = 1540096; Batches = 1; Channels = 512; Depth = 1; DepthStride = 1540096; Height = 1; Interleave = 1; Name = "encoder_output_embeds_eir"; PlaneCount = 512; PlaneStride = 3008; RowStride = 3008; Symbol = "encoder_output_embeds_eir"; Type = Float16; Width = 1500; } ); LiveOutputList = ( { BatchStride = 1540096; Batches = 1; Channels = 512; Depth = 1; DepthStride = 1540096; Height = 1; Interleave = 1; Name = "key_7_cast_fp16@output"; PlaneCount = 512; PlaneStride = 3008; RowStride = 3008; Symbol = "key_7_cast_fp16@output"; Type = Float16; Width = 1500; }, { BatchStride = 32768; Batches = 1; Channels = 512; Depth = 1; DepthStride = 32768; Height = 1; Interleave = 1; Name = "query_7_cast_fp16@output"; PlaneCount = 512; PlaneStride = 64; RowStride = 64; Symbol = "query_7_cast_fp16@output"; Type = Float16; Width = 1; }, { BatchStride = 1540096; Batches = 1; Channels = 512; Depth = 1; DepthStride = 1540096; Height = 1; Interleave = 1; Name = "value_7_cast_fp16@output"; PlaneCount = 512; PlaneStride = 3008; RowStride = 3008; Symbol = "value_7_cast_fp16@output"; Type = Float16; Width = 1500; } ); Name = "net_1"; } ); } : perfStatsMask=0} was not loaded by the client. [WhisperKit] Loaded text decoder [WhisperKit] Loading models from /var/mobile/Containers/Data/Application/826DAB14-7584-42C5-995D-DB610EC2F560/Documents/huggingface/models/argmaxinc/whisperkit-coreml/openai_whisper-base.en with prewarmMode: false [WhisperKit] Loading feature extractor [WhisperKit] Loaded feature extractor [WhisperKit] Loading audio encoder [WhisperKit] Loaded audio encoder [WhisperKit] Loading text decoder Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes [WhisperKit] Loaded text decoder [WhisperKit] Loading tokenizer for base.en [WhisperKit] Loaded tokenizer [WhisperKit] Loaded models for whisper size: base.en [WhisperKit] Current audio size: 32000 samples, most recent buffer: 1600 samples, most recent energy: (0.063490346, 0.00058010797, 0.0021451442, 1.3345561e-07) [WhisperKit] Current audio size: 64000 samples, most recent buffer: 1600 samples, most recent energy: (0.038249217, 0.0006892931, 0.0022904596, 7.053459e-07) [WhisperKit] Decoder init time: 0.012899041175842285 [WhisperKit] Prefill time: 0.0006909370422363281 [WhisperKit] Prefill prompt: ["<|startoftranscript|>", "<|0.00|>"] [WhisperKit] Decoding Seek: 0 [WhisperKit] Current audio size: 96000 samples, most recent buffer: 1600 samples, most recent energy: (0.0035468133, 0.0005462394, 0.0016842313, 7.582712e-07) [WhisperKit] Decoding 0.0s - 5.1s [WhisperKit] Decoding with tempeartures [0.0, 0.2, 0.4, 0.5996, 0.8, 1.0] [WhisperKit] Decoding Temperature: 0.0 [WhisperKit] Running main loop for a maximum of 223 iterations, starting at index 0 [WhisperKit] Forcing token 50257 at index 0 from initial prompt [WhisperKit] --------------- DECODER INPUTS DEBUG --------------- [WhisperKit] Cache Length: 0 Input Token: 50257 [WhisperKit] Key Cache | Val Cache | Align Cache | Update Mask | Decoder Mask | Position [WhisperKit] 0.000000 | 0.000000 | 0.000000 | 1 | 0 | 0 [WhisperKit] 0.000000 | 0.000000 | 0.000000 | 0 | -10000 | 1 [WhisperKit] 0.000000 | 0.000000 | 0.000000 | 0 | -10000 | 2 [WhisperKit] 0.000000 | 0.000000 | 0.000000 | 0 | -10000 | 3 [WhisperKit] Current audio size: 128000 samples, most recent buffer: 1600 samples, most recent energy: (0.009261748, 0.00054546853, 0.0017254573, 3.4167897e-08) [WhisperKit] tokenIndex: 0, token: 50361, word: <|nocaptions|> [WhisperKit] Forcing token 50363 at index 1 from initial prompt [WhisperKit] --------------- DECODER INPUTS DEBUG --------------- [WhisperKit] Cache Length: 1 Input Token: 50363 [WhisperKit] Key Cache | Val Cache | Align Cache | Update Mask | Decoder Mask | Position [WhisperKit] 0.476074 | -0.014931 | 0.000000 | 0 | 0 | 0 [WhisperKit] 0.000000 | 0.000000 | 0.024368 | 1 | 0 | 1 [WhisperKit] 0.000000 | 0.000000 | 0.000000 | 0 | -10000 | 2 [WhisperKit] 0.000000 | 0.000000 | 0.000000 | 0 | -10000 | 3 [WhisperKit] tokenIndex: 1, token: 357, word: ( [WhisperKit] Early stopping [WhisperKit] Fallback #1.0 (logProbThreshold) [WhisperKit] Decoding Temperature: 0.2 [WhisperKit] Running main loop for a maximum of 223 iterations, starting at index 0 [WhisperKit] Forcing token 50257 at index 0 from initial prompt [WhisperKit] --------------- DECODER INPUTS DEBUG --------------- [WhisperKit] Cache Length: 0 Input Token: 50257 [WhisperKit] Key Cache | Val Cache | Align Cache | Update Mask | Decoder Mask | Position [WhisperKit] 0.476074 | -0.014931 | 0.000000 | 1 | 0 | 0 [WhisperKit] 0.482666 | 0.555664 | 0.024368 | 0 | -10000 | 1 [WhisperKit] 0.000000 | 0.000000 | 0.032623 | 0 | -10000 | 2 [WhisperKit] 0.000000 | 0.000000 | 0.000000 | 0 | -10000 | 3
Originally posted by @iamyoungjo in https://github.com/argmaxinc/WhisperKit/issues/10#issuecomment-2126574781
https://github.com/argmaxinc/WhisperKit/issues/10#issuecomment-2185589116
Please feel free to reopen the issue if the comment I posted above doesn't resolve your issue!
Originally posted by @iamyoungjo in https://github.com/argmaxinc/WhisperKit/issues/10#issuecomment-2126574781