Open jamislike opened 2 weeks ago
Im impressed you were able to compile on mac since this has never been tested with macos. Is my understanding correct the autoregressive model ran correctly then it crashed when loading the diffusion nodel?
My initial shot in the dark is try adding this to CMakeLists.txt
add_definitions(-DGGML_MAX_NAME=128)
@balisujohn I've also built it on mac os arm.
I also tried adding the line you suggested into my CMakeLists.txt, although not sure I added it in the right place
./tortoise --message "based... dr freeman?" --voice "../models/mouse.bin" --seed 0 --output "based?.wav"
gpt_vocab_init: loading vocab from '../models/tokenizer.json'
gpt_vocab_init: vocab size = 255
autoregressive_model_load: loading model from '../models/ggml-model.bin'
autoregressive_model_load: ggml tensor size = 432 bytes
autoregressive_model_load: backend buffer size = 1889.29 MB
autoregressive_model_load: using CPU backend
autoregressive_model_load: model size = 1510.54 MB
44032 diffusion_model_load: loading model from '../models/ggml-diffusion-model.bin'
diffusion_model_load: ggml tensor size = 432 bytes
diffusion_model_load: backend buffer size = 689.28 MB
diffusion_model_load: using CPU backend
diffusion_model_load: model size = 688.49 MB
zsh: abort ./tortoise --message "based... dr freeman?" --voice "../models/mouse.bin" 0
is the same problem present on both the arm and intel mac os builds? also definitely try a fresh recursive clone of the repo if you haven't just in case your ggml submodule is out of date. The more info you can give me about exactly where the error message is coming from the more likely I can help, though I don't officially support MacOS at this time so no promises.
(I just realized two different people are reporting what appears to be the same error, so it is very likely a bug in either tortoise.cpp or the ggml fork it uses. If someone can solve this, it would be an extremely welcome contribution.)
I only tried it on ARM, I can try it on Intel (through Rosetta) later. I'm quite keen to get this working well on Mac in the near future. I'm not familiar with how GGML works yet. I think there is a lot of support out there to get this working on Mac
So something strange is happening because in both cases it looks like compilation succeeded, and the autoregressive load succeeded and inference succeeded, implying there is some difference between the models causing the difference in behavior. Redownloading the diffusion model is a good thing to try, though I think unlikely to fix anything since more than one person has hit this error on MacOS.
I can set up a MacOS dev env to try to debug myself, but it can't guarantee a timeframe on that. If you figure out the bug before I can, you will gain glory as the first outside tortoise.cpp contributor.
I only setup the repo last night, and I used the same GGML as your repo too. I'm not really sure where to start with debugging it. Are there more detailed outputs I could get from this that may help?
Assuming you are building with XCode; is there a way to run in debug mode? On Linux, I would use gdb, but I'm not sure if that's compatible with current XCode versions.
I run it in the Mac OS (ARM) terminal, but I think it's always using xcode on mac os for this unless I've misunderstood how it works. Here's the error I got in debug mode using lldb.
gpt_vocab_init: vocab size = 255
autoregressive_model_load: loading model from '../models/ggml-model.bin'
autoregressive_model_load: ggml tensor size = 432 bytes
autoregressive_model_load: backend buffer size = 1889.29 MB
autoregressive_model_load: using CPU backend
autoregressive_model_load: model size = 1510.54 MB
41984 diffusion_model_load: loading model from '../models/ggml-diffusion-model.bin'
diffusion_model_load: ggml tensor size = 432 bytes
diffusion_model_load: backend buffer size = 689.28 MB
diffusion_model_load: using CPU backend
diffusion_model_load: model size = 688.49 MB
Process 89278 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
frame #0: 0x000000018e276a60 libsystem_kernel.dylib`__pthread_kill + 8
libsystem_kernel.dylib`:
-> 0x18e276a60 <+8>: b.lo 0x18e276a80 ; <+40>
0x18e276a64 <+12>: pacibsp
0x18e276a68 <+16>: stp x29, x30, [sp, #-0x10]!
0x18e276a6c <+20>: mov x29, sp
Target 0: (tortoise) stopped.
And a backtrace
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
* frame #0: 0x000000018e276a60 libsystem_kernel.dylib`__pthread_kill + 8
frame #1: 0x000000018e2aec20 libsystem_pthread.dylib`pthread_kill + 288
frame #2: 0x000000018e1bbac4 libsystem_c.dylib`__abort + 136
frame #3: 0x000000018e1ad25c libsystem_c.dylib`__stack_chk_fail + 96
frame #4: 0x000000010000a204 tortoise`diffusion_model_load(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, diffusion_model&) + 20616
frame #5: 0x000000010001c700 tortoise`diffusion(std::__1::vector<float, std::__1::allocator<float>>) + 356
frame #6: 0x0000000100022e30 tortoise`main + 2552
frame #7: 0x000000018df260e0 dyld`start + 2360
Edit: I'm working through the debugging process currently. Learning as I go though as I mostly use python.
It compiles readily without modifications. MacOS allocator is a bit different, so it catches some off-errors. OpenBSD has one even more paranoid. Here's a patch I've made two weeks ago but forgot to share
diff --git a/main.cpp b/main.cpp
index 40f934b..33f3741 100644
--- a/main.cpp
+++ b/main.cpp
@@ -822,7 +822,9 @@ bool autoregressive_model_load(const std::string &fname,
}
int32_t nelements = 1;
- int32_t ne[2] = {1, 1};
+ int32_t ne[n_dims];
+ memset(ne, 1, n_dims * sizeof(int32_t));
+
for (int i = 0; i < n_dims; ++i) {
fin.read(reinterpret_cast<char *>(&ne[i]), sizeof(ne[i]));
nelements *= ne[i];
@@ -1946,7 +1948,9 @@ bool vocoder_model_load(const std::string &fname, vocoder_model &model) {
}
int32_t nelements = 1;
- int32_t ne[2] = {1, 1};
+ int32_t ne[n_dims];
+ memset(ne, 1, n_dims * sizeof(int32_t));
+
for (int i = 0; i < n_dims; ++i) {
fin.read(reinterpret_cast<char *>(&ne[i]), sizeof(ne[i]));
nelements *= ne[i];
PS: this also means it should give wrong results on any architecture, just uncaught.
I've tried those changes but now I get this error instead.
gpt_vocab_init: loading vocab from '../models/tokenizer.json'
gpt_vocab_init: vocab size = 255
autoregressive_model_load: loading model from '../models/ggml-model.bin'
autoregressive_model_load: ggml tensor size = 432 bytes
autoregressive_model_load: backend buffer size = 1889.29 MB
autoregressive_model_load: using CPU backend
autoregressive_model_load: tensor 'inference_model.transformer.h.0.ln_1.weight' has wrong shape in model file: got [1024, 1], expected [1024, 0]
autoregressive: failed to load model from '../models/ggml-model.bin'
Oh, so that's why I get #13. This used to work some time ago on a previous checkout.
At least this should give a direction where to look at.
At least this should give a direction where to look at.
Do you still think it relates to that part of the code?
while we're at it, how do we update this. It's such a pain during the build haha
warning: gml_type_sizef' is deprecated: use ggml_row_size() instead [-Wdeprecated-declarations] ggml_type_sizef(GGML_TYPE_F32);
while we're at it, how do we update this. It's such a pain during the build haha
warning: gml_type_sizef' is deprecated: use ggml_row_size() instead [-Wdeprecated-declarations] ggml_type_sizef(GGML_TYPE_F32);
It seems we can change ggml_type_sizef(GGML_TYPE_F32)
with ggml_row_size(GGML_TYPE_F32,1)
but considering I can't actually fully run it it's not ideal to test. The warning is gone though
I can fix those warnings, I've been meaning to anyway. Try redownloading the diffusion model if you haven't.
I can fix those warnings, I've been meaning to anyway. Try redownloading the diffusion model if you haven't.
I got it from https://huggingface.co/balisujohn/tortoise-ggml/tree/main just last night
could be a bit error in the download, I've actually seen it happen where I redownloaded and it suddenly worked when it previously had a shape error like that.
expected [1024, 0]
This is very suspicious, and should never happen, is the memset too short perhaps? maybe n_dims is only set to 1 instead of 2?
expected [1024, 0]
This is very suspicious, and should never happen, is the memset too short perhaps? maybe n_dims is only set to 1 instead of 2?
I did some rough debugging here and simply printed out the n_dims before that memset line.
....
Number of dimensions: 1
Number of dimensions: 1
Number of dimensions: 1
Number of dimensions: 2
Number of dimensions: 1
Number of dimensions: 2
Number of dimensions: 1
Number of dimensions: 1
Number of dimensions: 1
Number of dimensions: 2
Number of dimensions: 1
...
wait lol yeah so looking at the patch @dur-randir applies it does look kind of inexplicable that it's successfully loading tensors with more than 2 dimensions but it somehow works fine on Linux.
try just replacing
int32_t ne[2] = {1, 1};
with
int32_t ne[4] = {1, 1, 1, 1};
wait lol yeah so looking at the patch @dur-randir applies it does look kind of inexplicable that it's successfully loading tensors with more than 2 dimensions but it somehow works fine on Linux.
try just replacing
int32_t ne[2] = {1, 1};
with
int32_t ne[4] = {1, 1, 1, 1};
same issue as earlier still
edit: there is a third one of them I didn't change that is different from the changes @dur-randir said earlier.. I'll make sure to change them all
I'm past the error! still running but messaging before anyone responds to previous issue
yep the last one was the diffusion model one which is the one relevant to the issue looks like. thanks for helping me debug this looks like you guys found a memory bug the compiler hadn't fussed sufficiently about on Linux. I expect this is what was messing up my windows build attempts also.
I made the wav audio succesfully. not super speedy as I'm currently running it on cpu. I'm keen to run this on metal asap
metal backend support will take effort, some ggml ops used by tortoise.cpp don't have metal support yet
though it's a great opportunity to make a name for yourself as a GGML contributor
metal backend support will take effort, some ggml ops used by tortoise.cpp don't have metal support yet
I know almost nothing about ggml, I can just compile and (some times debug c++). can support for metal be partial to still get improvements, or do all the ggml ops have to support it to use it at all?
so you should be able to compile for metal without metal support for all ops, then when you try to run it, it will fail and tell you which op is not supported when it runs into an op that's not supported, then you can add that op to ggml metal backend and then retry, and repeat until done. You also can activate the tests that are commented out in main to make sure the computation is correct for each module. On that topic, do those tests succeed on MacOS on CPU?
You can look at this PR for example to see what adding a metal OP looks like https://github.com/ggerganov/ggml/pull/806/files#diff-34ddade6285ebb1664f65a43f79947e14eeb718c0cb7f80e957bc97b6e9d6e49
The metal relevant additions are in ggml-metal.m
and ggml-metal.metal
And I am not that sure about using multiple backends at runtime.
What exactly do you mean by "compile for metal". Do I need to add something to CMakeLists?
Edit: I see there is a parameter in main.cpp
I think the flag is GGML_USE_METAL
instead of GGML_USE_CUBLAS
I think the flag is
GGML_USE_METAL
instead ofGGML_USE_CUBLAS
cmake .. -DGGML_USE_METAL=ON
I think.
It's not working still though so I'm just going to comment out stuff in the imports for now
I see @balisujohn it's because of this.. I need to add one for metal I guess
if (GGML_CUBLAS)
add_definitions(-DGGML_USE_CUBLAS)
endif()
Ok, at least I have some sort of place to start
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M3 Max
ggml_metal_init: picking default device: Apple M3 Max
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: error: could not use bundle path to find ggml-metal.metal, falling back to trying cwd
ggml_metal_init: loading 'ggml-metal.metal'
ggml_metal_init: error: Error Domain=NSCocoaErrorDomain Code=260 "The file “ggml-metal.metal” couldn’t be opened because there is no such file." UserInfo={NSFilePath=ggml-metal.metal, NSUnderlyingError=0x600001fdc360 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}}
autoregressive_model_load: ggml_backend_metal_init() failed
autoregressive_model_load: using CPU backend
This work should perhaps continue under a new issue
EDIT: I'm suspicious of this
CMake Warning (dev) at ggml/src/CMakeLists.txt:526 (install):
Target ggml has RESOURCE files but no RESOURCE DESTINATION.
This warning is for project developers. Use -Wno-dev to suppress it.
Yep sounds good wrt new issue and feel free to submit a pull request with your version working on MacOS cpu, and I'll confirm it works on Linux before merging.
Seems to crash on run:
43008 diffusion_model_load: loading model from '../models/ggml-diffusion-model.bin' diffusion_model_load: ggml tensor size = 368 bytes diffusion_model_load: backend buffer size = 689.28 MB diffusion_model_load: using CPU backend
Process 61390 stopped