ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
34.99k stars 3.57k forks source link

Stuck on convert_encoder while converting to CoreML #773

Open clyang opened 1 year ago

clyang commented 1 year ago

I've tried to convert small model to CoreML format on Mac M1 by following the CoreML instruction.

However, the process stuck after Running MIL backend_mlprogram pipeline step. I can see ANECompilerService is using 100% cpu in top but the converting process just never ends.

My environment:

crystoneme commented 1 year ago

Same here, for the base and tiny models, it's ok.

But for small and above models, the command stuck in the 'Running MIL backend_mlprogram pipeline: 100%'.

ggerganov commented 1 year ago

If it works for the tiny and base models, then I guess it is just taking long time to process the bigger models. I don't know what determines how long it takes. On my M1 Pro, for the medium it takes more than half hour to process, but someone reported that it takes just less than 2 minutes on their M2 Air: https://github.com/ggerganov/whisper.cpp/pull/566#issuecomment-1509936248

Maybe try to restart the computer, not sure

ficapy commented 1 year ago

I have successfully executed on M1 . /models/generate-coreml-model.sh large, which took about 50 minutes, it would be better to mark an approximate time in the documentation @ggerganov

crystoneme commented 1 year ago

I have successfully executed on M1 . /models/generate-coreml-model.sh large, which took about 50 minutes, it would be better to mark an approximate time in the documentation @ggerganov

Good advice. Maybe it just need more time to process.

kyteague commented 1 year ago

I have successfully executed on M1 . /models/generate-coreml-model.sh large, which took about 50 minutes, it would be better to mark an approximate time in the documentation @ggerganov

I think it would be better to have some sort of progress update if possible. It just looks like it's hanging.

crystoneme commented 1 year ago

I'v successfully converted all models, time spend from 1 min to 60 min. No errors, just need time.

System: Macbook Pro M1Pro

clyang commented 1 year ago

I'v successfully converted all models, time spend from 1 min to 60 min. No errors, just need time.

System: Macbook Pro M1Pro

Can you let me know your MacOS version?

edwios commented 1 year ago

Something's not quite right here. It took my M1 Max 64GB RAM exactly 4 hours to convert the base.en and almost 3 hours to convert the medium.en (which didn't load btw). Could someone share the details such as torch & python version etc?

I am using torch.2.1.0-dev20230417 python 3.10.10.

ephemer commented 1 year ago

FWIW I'm not entirely convinced that waiting is required or is the full answer here. I waited for multiple hours converting the medium model and it didn't finish, but if I force-quit ANECompilerService after waiting for a few mins the process appears to complete successfully.

That said, on my Mac I end up with the same issue with the converted model in Xcode, both at runtime and when performance benchmarking the model – it gets stuck compiling the model and never finishes. Sometimes if I'm lucky the compilation appears to happen immediately and I can use the model as usual for that run of the program – mostly only in Debug builds though. Seems to be a bug in the compiler service.

I have the same issues if I make a mlprogram or an mlmodel, but the mlprogram seems to show the problem more often / worse.

I have torch==2.0, python 3.10, M1 MbP, 16GB RAM

RogerPu commented 1 year ago

I have the same problem, the 1st running works some minutes later, but when I update my system and run it for the 2nd time, the ANECompilerService will continue to run for 10 hours, but when I force quit it, the main bin will continue to run and give me the correct result.

FWIW I'm not entirely convinced that waiting is required or is the full answer here. I waited for multiple hours converting the medium model and it didn't finish, but if I force-quit ANECompilerService after waiting for a few mins the process appears to complete successfully.

That said, on my Mac I end up with the same issue with the converted model in Xcode, both at runtime and when performance benchmarking the model – it gets stuck compiling the model and never finishes. Sometimes if I'm lucky the compilation appears to happen immediately and I can use the model as usual for that run of the program – mostly only in Debug builds though. Seems to be a bug in the compiler service.

I have the same issues if I make a mlprogram or an mlmodel, but the mlprogram seems to show the problem more often / worse.

I have torch==2.0, python 3.10, M1 MbP, 16GB RAM

hoonlight commented 1 year ago

I experienced the same problem. As soon as I force-quit ANECompilerService, the conversion was completed quickly. (model : large / ~2min with M1 Pro 14 16GB ram)

I also encountered the same issue when loading the model, but once again, the model loaded successfully right after force-quit ANECompilerService.

clyang commented 1 year ago

It's glad to know I'm not the only one having this issue.

cnsilvan commented 1 year ago

I experienced the same problem. As soon as I force-quit ANECompilerService, the conversion was completed quickly. (model : large / ~2min with M1 Pro 14 16GB ram)

I also encountered the same issue when loading the model, but once again, the model loaded successfully right after force-quit ANECompilerService.

same here

arrowcircle commented 1 year ago

same here. 68 minutes, then I found solution to kill ANECompilerService and it finished

Erimus-Koo commented 1 year ago

@archive-r @arrowcircle

However, even after killing ANECompilerService, the same issues occur when you run it again.

I have successfully used the base option and experienced no issues during subsequent runs. However, when I tried using the medium option, it became stuck.

janngobble commented 1 year ago

I am having the exact same issue - if I kill ANECompilerService, the COREML compiled main continues on and begins to work. What is the issue here? It SEEMS to be recompiling the ml each time.

n8henrie commented 1 year ago

Same. Tried generate-coreml-model.sh a few times with both medium.en and large, even let it run 8h or so overnight, never completed.

After sudo kill -9 on ANECompilerService (sending SIGTERM didn't work) , the process finished almost immediately.

Afterwards, running the model hangs indefinitely at whisper_init_state: first run on a device may take a while ...

If I again send SIGKILL to ANECompilerService, it finishes within seconds and correctly transcribes the audio.

carstenuhlig commented 1 year ago

Killing ANECompilerService with -9 works. But I must do this every single time i start a normal transcription. This can't be normal...

Any suggestions?

carstenuhlig commented 1 year ago

Killing ANECompilerService with -9 works. But I have to do this every single time i start a normal transcription. This can't be normal...

Any suggestions?

eual8 commented 1 year ago

Killing ANECompilerService with -9 works. But I have to do this every single time i start a normal transcription. This can't be normal...

Any suggestions?

the same situation, MBP M1 max, 32 GB, mac os 13.5.1

philk commented 1 year ago

Same thing was happening to me with any of the downloadable ones I could find, building it locally on my machine worked fine though. Took ~10m for small.en on my M1 MBP

eual8 commented 1 year ago

Same thing was happening to me with any of the downloadable ones I could find, building it locally on my machine worked fine though. Took ~10m for small.en on my M1 MBP

How did you do this? using the convert-pt-to-ggml.py script?

eual8 commented 1 year ago

I was able to convert the whisper large-v2.pt model (it took less than 1 minute). Now there are no errors, thanks for the advice. I did everything as written here: https://github.com/ggerganov/whisper.cpp/blob/master/models/README.md

philk commented 1 year ago

Used the instructions here https://github.com/ggerganov/whisper.cpp/pull/566 under the Usage section

Sogl commented 1 year ago

The process is stuck on large model and it's been over 10 hours. Force kill ANECompilerService in the Activity Monitor (Memory tab) helped me.