balisujohn / tortoise.cpp

A ggml (C++) re-implementation of tortoise-tts
MIT License
130 stars 7 forks source link

EXC_BAD_ACCESS on run on macOS CPU only #12

Open jamislike opened 2 weeks ago

jamislike commented 2 weeks ago

Seems to crash on run:

43008 diffusion_model_load: loading model from '../models/ggml-diffusion-model.bin' diffusion_model_load: ggml tensor size = 368 bytes diffusion_model_load: backend buffer size = 689.28 MB diffusion_model_load: using CPU backend

Process 61390 stopped

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x26feb72f8)
    frame #0: 0x00000001843f3b80 libc++.1.dylib`std::__1::ios_base::clear(unsigned int) + 28
libc++.1.dylib`std::__1::ios_base::clear:
->  0x1843f3b80 <+28>: str    w8, [x0, #0x20]
    0x1843f3b84 <+32>: ldr    w9, [x0, #0x24]
    0x1843f3b88 <+36>: tst    w9, w8
    0x1843f3b8c <+40>: b.ne   0x1843f3b98               ; <+52>
Target 0: (tortoise) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x26feb72f8)
  * frame #0: 0x00000001843f3b80 libc++.1.dylib`std::__1::ios_base::clear(unsigned int) + 28
    frame #1: 0x00000001843f7a90 libc++.1.dylib`std::__1::basic_istream<char, std::__1::char_traits<char>>::read(char*, long) + 204
    frame #2: 0x000000010000e774 tortoise`diffusion_model_load(fname="../models/ggml-diffusion-model.bin", model=0x000000016fdfe9b8) at main.cpp:1564:13
    frame #3: 0x0000000100021e08 tortoise`diffusion(trimmed_latents=size=43008) at main.cpp:5631:10
    frame #4: 0x0000000100026650 tortoise`main(argc=1, argv=0x000000016fdff370) at main.cpp:6581:28
    frame #5: 0x00000001841320e0 dyld`start + 2360
balisujohn commented 2 weeks ago

Im impressed you were able to compile on mac since this has never been tested with macos. Is my understanding correct the autoregressive model ran correctly then it crashed when loading the diffusion nodel?

My initial shot in the dark is try adding this to CMakeLists.txt

add_definitions(-DGGML_MAX_NAME=128)
baileyheading commented 1 week ago

@balisujohn I've also built it on mac os arm.

I also tried adding the line you suggested into my CMakeLists.txt, although not sure I added it in the right place

./tortoise --message "based... dr freeman?" --voice "../models/mouse.bin" --seed 0 --output "based?.wav"
gpt_vocab_init: loading vocab from '../models/tokenizer.json'
gpt_vocab_init: vocab size = 255
autoregressive_model_load: loading model from '../models/ggml-model.bin'
autoregressive_model_load: ggml tensor size    = 432 bytes
autoregressive_model_load: backend buffer size = 1889.29 MB
autoregressive_model_load: using CPU backend
autoregressive_model_load: model size  =  1510.54 MB
 44032 diffusion_model_load: loading model from '../models/ggml-diffusion-model.bin'
diffusion_model_load: ggml tensor size    = 432 bytes
diffusion_model_load: backend buffer size = 689.28 MB
diffusion_model_load: using CPU backend
diffusion_model_load: model size  =   688.49 MB
zsh: abort      ./tortoise --message "based... dr freeman?" --voice "../models/mouse.bin"  0 
balisujohn commented 1 week ago

is the same problem present on both the arm and intel mac os builds? also definitely try a fresh recursive clone of the repo if you haven't just in case your ggml submodule is out of date. The more info you can give me about exactly where the error message is coming from the more likely I can help, though I don't officially support MacOS at this time so no promises.

(I just realized two different people are reporting what appears to be the same error, so it is very likely a bug in either tortoise.cpp or the ggml fork it uses. If someone can solve this, it would be an extremely welcome contribution.)

baileyheading commented 1 week ago

I only tried it on ARM, I can try it on Intel (through Rosetta) later. I'm quite keen to get this working well on Mac in the near future. I'm not familiar with how GGML works yet. I think there is a lot of support out there to get this working on Mac

balisujohn commented 6 days ago

So something strange is happening because in both cases it looks like compilation succeeded, and the autoregressive load succeeded and inference succeeded, implying there is some difference between the models causing the difference in behavior. Redownloading the diffusion model is a good thing to try, though I think unlikely to fix anything since more than one person has hit this error on MacOS.

I can set up a MacOS dev env to try to debug myself, but it can't guarantee a timeframe on that. If you figure out the bug before I can, you will gain glory as the first outside tortoise.cpp contributor.

baileyheading commented 6 days ago

I only setup the repo last night, and I used the same GGML as your repo too. I'm not really sure where to start with debugging it. Are there more detailed outputs I could get from this that may help?

balisujohn commented 6 days ago

Assuming you are building with XCode; is there a way to run in debug mode? On Linux, I would use gdb, but I'm not sure if that's compatible with current XCode versions.

baileyheading commented 6 days ago

I run it in the Mac OS (ARM) terminal, but I think it's always using xcode on mac os for this unless I've misunderstood how it works. Here's the error I got in debug mode using lldb.

gpt_vocab_init: vocab size = 255
autoregressive_model_load: loading model from '../models/ggml-model.bin'
autoregressive_model_load: ggml tensor size    = 432 bytes
autoregressive_model_load: backend buffer size = 1889.29 MB
autoregressive_model_load: using CPU backend
autoregressive_model_load: model size  =  1510.54 MB
 41984 diffusion_model_load: loading model from '../models/ggml-diffusion-model.bin'
diffusion_model_load: ggml tensor size    = 432 bytes
diffusion_model_load: backend buffer size = 689.28 MB
diffusion_model_load: using CPU backend
diffusion_model_load: model size  =   688.49 MB
Process 89278 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x000000018e276a60 libsystem_kernel.dylib`__pthread_kill + 8
libsystem_kernel.dylib`:
->  0x18e276a60 <+8>:  b.lo   0x18e276a80               ; <+40>
    0x18e276a64 <+12>: pacibsp 
    0x18e276a68 <+16>: stp    x29, x30, [sp, #-0x10]!
    0x18e276a6c <+20>: mov    x29, sp
Target 0: (tortoise) stopped.

And a backtrace

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x000000018e276a60 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x000000018e2aec20 libsystem_pthread.dylib`pthread_kill + 288
    frame #2: 0x000000018e1bbac4 libsystem_c.dylib`__abort + 136
    frame #3: 0x000000018e1ad25c libsystem_c.dylib`__stack_chk_fail + 96
    frame #4: 0x000000010000a204 tortoise`diffusion_model_load(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, diffusion_model&) + 20616
    frame #5: 0x000000010001c700 tortoise`diffusion(std::__1::vector<float, std::__1::allocator<float>>) + 356
    frame #6: 0x0000000100022e30 tortoise`main + 2552
    frame #7: 0x000000018df260e0 dyld`start + 2360

Edit: I'm working through the debugging process currently. Learning as I go though as I mostly use python.

dur-randir commented 6 days ago

It compiles readily without modifications. MacOS allocator is a bit different, so it catches some off-errors. OpenBSD has one even more paranoid. Here's a patch I've made two weeks ago but forgot to share

diff --git a/main.cpp b/main.cpp
index 40f934b..33f3741 100644
--- a/main.cpp
+++ b/main.cpp
@@ -822,7 +822,9 @@ bool autoregressive_model_load(const std::string &fname,
       }

       int32_t nelements = 1;
-      int32_t ne[2] = {1, 1};
+      int32_t ne[n_dims];
+      memset(ne, 1, n_dims * sizeof(int32_t));
+
       for (int i = 0; i < n_dims; ++i) {
         fin.read(reinterpret_cast<char *>(&ne[i]), sizeof(ne[i]));
         nelements *= ne[i];
@@ -1946,7 +1948,9 @@ bool vocoder_model_load(const std::string &fname, vocoder_model &model) {
       }

       int32_t nelements = 1;
-      int32_t ne[2] = {1, 1};
+      int32_t ne[n_dims];
+      memset(ne, 1, n_dims * sizeof(int32_t));
+
       for (int i = 0; i < n_dims; ++i) {
         fin.read(reinterpret_cast<char *>(&ne[i]), sizeof(ne[i]));
         nelements *= ne[i];
dur-randir commented 6 days ago

PS: this also means it should give wrong results on any architecture, just uncaught.

baileyheading commented 6 days ago

I've tried those changes but now I get this error instead.

gpt_vocab_init: loading vocab from '../models/tokenizer.json'
gpt_vocab_init: vocab size = 255
autoregressive_model_load: loading model from '../models/ggml-model.bin'
autoregressive_model_load: ggml tensor size    = 432 bytes
autoregressive_model_load: backend buffer size = 1889.29 MB
autoregressive_model_load: using CPU backend
autoregressive_model_load: tensor 'inference_model.transformer.h.0.ln_1.weight' has wrong shape in model file: got [1024, 1], expected [1024, 0]
autoregressive: failed to load model from '../models/ggml-model.bin'
dur-randir commented 6 days ago

Oh, so that's why I get #13. This used to work some time ago on a previous checkout.

dur-randir commented 6 days ago

At least this should give a direction where to look at.

baileyheading commented 6 days ago

At least this should give a direction where to look at.

Do you still think it relates to that part of the code?

baileyheading commented 6 days ago

while we're at it, how do we update this. It's such a pain during the build haha

warning: gml_type_sizef' is deprecated: use ggml_row_size() instead [-Wdeprecated-declarations] ggml_type_sizef(GGML_TYPE_F32);

baileyheading commented 6 days ago

while we're at it, how do we update this. It's such a pain during the build haha

warning: gml_type_sizef' is deprecated: use ggml_row_size() instead [-Wdeprecated-declarations] ggml_type_sizef(GGML_TYPE_F32);

It seems we can change ggml_type_sizef(GGML_TYPE_F32) with ggml_row_size(GGML_TYPE_F32,1) but considering I can't actually fully run it it's not ideal to test. The warning is gone though

balisujohn commented 6 days ago

I can fix those warnings, I've been meaning to anyway. Try redownloading the diffusion model if you haven't.

baileyheading commented 6 days ago

I can fix those warnings, I've been meaning to anyway. Try redownloading the diffusion model if you haven't.

I got it from https://huggingface.co/balisujohn/tortoise-ggml/tree/main just last night

balisujohn commented 6 days ago

could be a bit error in the download, I've actually seen it happen where I redownloaded and it suddenly worked when it previously had a shape error like that.

balisujohn commented 6 days ago
expected [1024, 0]

This is very suspicious, and should never happen, is the memset too short perhaps? maybe n_dims is only set to 1 instead of 2?

baileyheading commented 6 days ago
expected [1024, 0]

This is very suspicious, and should never happen, is the memset too short perhaps? maybe n_dims is only set to 1 instead of 2?

I did some rough debugging here and simply printed out the n_dims before that memset line.

....
Number of dimensions: 1
Number of dimensions: 1
Number of dimensions: 1
Number of dimensions: 2
Number of dimensions: 1
Number of dimensions: 2
Number of dimensions: 1
Number of dimensions: 1
Number of dimensions: 1
Number of dimensions: 2
Number of dimensions: 1
...
balisujohn commented 6 days ago

wait lol yeah so looking at the patch @dur-randir applies it does look kind of inexplicable that it's successfully loading tensors with more than 2 dimensions but it somehow works fine on Linux.

try just replacing

      int32_t ne[2] = {1, 1};

with

      int32_t ne[4] = {1, 1, 1, 1};
baileyheading commented 6 days ago

wait lol yeah so looking at the patch @dur-randir applies it does look kind of inexplicable that it's successfully loading tensors with more than 2 dimensions but it somehow works fine on Linux.

try just replacing

      int32_t ne[2] = {1, 1};

with

      int32_t ne[4] = {1, 1, 1, 1};

same issue as earlier still

edit: there is a third one of them I didn't change that is different from the changes @dur-randir said earlier.. I'll make sure to change them all

baileyheading commented 6 days ago

I'm past the error! still running but messaging before anyone responds to previous issue

balisujohn commented 6 days ago

yep the last one was the diffusion model one which is the one relevant to the issue looks like. thanks for helping me debug this looks like you guys found a memory bug the compiler hadn't fussed sufficiently about on Linux. I expect this is what was messing up my windows build attempts also.

baileyheading commented 6 days ago

I made the wav audio succesfully. not super speedy as I'm currently running it on cpu. I'm keen to run this on metal asap

balisujohn commented 6 days ago

metal backend support will take effort, some ggml ops used by tortoise.cpp don't have metal support yet

balisujohn commented 6 days ago

though it's a great opportunity to make a name for yourself as a GGML contributor

baileyheading commented 6 days ago

metal backend support will take effort, some ggml ops used by tortoise.cpp don't have metal support yet

I know almost nothing about ggml, I can just compile and (some times debug c++). can support for metal be partial to still get improvements, or do all the ggml ops have to support it to use it at all?

balisujohn commented 6 days ago

so you should be able to compile for metal without metal support for all ops, then when you try to run it, it will fail and tell you which op is not supported when it runs into an op that's not supported, then you can add that op to ggml metal backend and then retry, and repeat until done. You also can activate the tests that are commented out in main to make sure the computation is correct for each module. On that topic, do those tests succeed on MacOS on CPU?

balisujohn commented 6 days ago

You can look at this PR for example to see what adding a metal OP looks like https://github.com/ggerganov/ggml/pull/806/files#diff-34ddade6285ebb1664f65a43f79947e14eeb718c0cb7f80e957bc97b6e9d6e49

The metal relevant additions are in ggml-metal.m and ggml-metal.metal

balisujohn commented 6 days ago

And I am not that sure about using multiple backends at runtime.

baileyheading commented 6 days ago

What exactly do you mean by "compile for metal". Do I need to add something to CMakeLists?

Edit: I see there is a parameter in main.cpp

balisujohn commented 6 days ago

I think the flag is GGML_USE_METAL instead of GGML_USE_CUBLAS

baileyheading commented 6 days ago

I think the flag is GGML_USE_METAL instead of GGML_USE_CUBLAS

cmake .. -DGGML_USE_METAL=ON

I think.

It's not working still though so I'm just going to comment out stuff in the imports for now

baileyheading commented 6 days ago

I see @balisujohn it's because of this.. I need to add one for metal I guess

if (GGML_CUBLAS)
    add_definitions(-DGGML_USE_CUBLAS)
endif()
baileyheading commented 6 days ago

Ok, at least I have some sort of place to start

ggml_metal_init: allocating
ggml_metal_init: found device: Apple M3 Max
ggml_metal_init: picking default device: Apple M3 Max
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: error: could not use bundle path to find ggml-metal.metal, falling back to trying cwd
ggml_metal_init: loading 'ggml-metal.metal'
ggml_metal_init: error: Error Domain=NSCocoaErrorDomain Code=260 "The file “ggml-metal.metal” couldn’t be opened because there is no such file." UserInfo={NSFilePath=ggml-metal.metal, NSUnderlyingError=0x600001fdc360 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}}
autoregressive_model_load: ggml_backend_metal_init() failed
autoregressive_model_load: using CPU backend

This work should perhaps continue under a new issue

EDIT: I'm suspicious of this

CMake Warning (dev) at ggml/src/CMakeLists.txt:526 (install):
  Target ggml has RESOURCE files but no RESOURCE DESTINATION.
This warning is for project developers.  Use -Wno-dev to suppress it.
balisujohn commented 6 days ago

Yep sounds good wrt new issue and feel free to submit a pull request with your version working on MacOS cpu, and I'll confirm it works on Linux before merging.