Closed beebopkim closed 4 months ago
Is this a new issue? Did the same model load correctly previously?
Looking at your debug logs, I don't see metal being initialized.
Did you build with LLAMA_METAL=1
when compiling KoboldCpp?
Yes, this is a new problem.
When I saw your reply first time, I thought I might do a mistake. But now, I confirmed that I confirmed thaat this is really an issue.
I pulled the concedo_experimental
branch up to date.
And I built it with LLAMA_METAL=1
.
It compiled well without errors, then I tried to run it.
Failed. So I tried to run a small GGUF model on same disk volume.
It is failed again.
I also confirmed that tag/v1.60.1 works and tags/v1.61 is failed. There must be something between v1.60.1 and v1.61. Also found that last successful commit is 9229ea66, just before 6a32c14e, Merge branch 'master' into concedo_experimental by you.
For some reason, your program is not calling the ggml_backend_metal_init
function, otherwise you would see a ggml_metal_init: allocating
displayed. I looked through the commits between these 2 tags and I could not find any reason why this would happen.
Let's try troubleshoot this sequentially.
First, start from the latest Commit: [9f102b9], I added the extra headers that were not present in the file before.
Do a full make clean
followed by make LLAMA_METAL=1
and see if metal gets initialized correctly.
If it's still not working, there are 3 commits that change the metal related files. They are be858f6 bb6d00b and 8a3012a . Of these 3, I think the most likely one to cause issues is 8a3012a .
Unfortunately, you won't be able to directly revert these commits due to merge conflicts. But perhaps you could examine the changes and see if you can figure out what causes the problems. If you're still stuck, let me know and I'll create a few separate checkpoints you can try - I can't debug this on my side as I don't have a mac. It's weird as it seems the Init isn't even being called.
If you can stick some print statements in this function https://github.com/LostRuins/koboldcpp/blob/concedo/ggml-metal.m#L2835 and within ggml_metal_init
itself, it would be helpful to know if it's called, and which part the init fails at.
ggml_backend_t ggml_backend_metal_init(void) {
struct ggml_metal_context * ctx = ggml_metal_init(GGML_DEFAULT_N_THREADS);
if (ctx == NULL) {
return NULL;
}
ggml_backend_t metal_backend = malloc(sizeof(struct ggml_backend));
*metal_backend = (struct ggml_backend) {
/* .guid = */ ggml_backend_metal_guid(),
/* .interface = */ ggml_backend_metal_i,
/* .context = */ ctx,
};
return metal_backend;
}
Would also be helpful to compare the terminal output of the successfull v1.60.1 build.
I found that the first line of ggml_backend_metal_init
was failed.
struct ggml_metal_context * ctx = ggml_metal_init(GGML_DEFAULT_N_THREADS);
So I investigated ggml_metal_init
.
static struct ggml_metal_context * ggml_metal_init(int n_cb) {
GGML_METAL_LOG_INFO("%s: allocating\n", __func__);
#if TARGET_OS_OSX && !GGML_METAL_NDEBUG
// Show all the Metal device instances in the system
NSArray * devices = MTLCopyAllDevices();
for (id<MTLDevice> device in devices) {
GGML_METAL_LOG_INFO("%s: found device: %s\n", __func__, [[device name] UTF8String]);
}
[devices release]; // since it was created by a *Copy* C method
#endif
// Pick and show default Metal device
id<MTLDevice> device = MTLCreateSystemDefaultDevice();
GGML_METAL_LOG_INFO("%s: picking default device: %s\n", __func__, [[device name] UTF8String]);
// Configure context
struct ggml_metal_context * ctx = malloc(sizeof(struct ggml_metal_context));
ctx->device = device;
ctx->n_cb = MIN(n_cb, GGML_METAL_MAX_BUFFERS);
ctx->queue = [ctx->device newCommandQueue];
ctx->d_queue = dispatch_queue_create("ggml-metal", DISPATCH_QUEUE_CONCURRENT);
id<MTLLibrary> metal_library;
// load library
{
NSBundle * bundle = nil;
#ifdef SWIFT_PACKAGE
bundle = SWIFTPM_MODULE_BUNDLE;
#else
bundle = [NSBundle bundleForClass:[GGMLMetalClass class]];
#endif
NSError * error = nil;
NSString * libPath = [bundle pathForResource:@"default" ofType:@"metallib"];
if (libPath != nil) {
// pre-compiled library found
NSURL * libURL = [NSURL fileURLWithPath:libPath];
GGML_METAL_LOG_INFO("%s: loading '%s'\n", __func__, [libPath UTF8String]);
metal_library = [ctx->device newLibraryWithURL:libURL error:&error];
if (error) {
GGML_METAL_LOG_ERROR("%s: error: %s\n", __func__, [[error description] UTF8String]);
return NULL;
}
} else {
#if GGML_METAL_EMBED_LIBRARY
GGML_METAL_LOG_INFO("%s: using embedded metal library\n", __func__);
extern const char ggml_metallib_start[];
extern const char ggml_metallib_end[];
NSString * src = [[NSString alloc] initWithBytes:ggml_metallib_start length:(ggml_metallib_end-ggml_metallib_start) encoding:NSUTF8StringEncoding];
#else
GGML_METAL_LOG_INFO("%s: default.metallib not found, loading from source\n", __func__);
NSString * sourcePath;
NSString * ggmlMetalPathResources = [[NSProcessInfo processInfo].environment objectForKey:@"GGML_METAL_PATH_RESOURCES"];
GGML_METAL_LOG_INFO("%s: GGML_METAL_PATH_RESOURCES = %s\n", __func__, ggmlMetalPathResources ? [ggmlMetalPathResources UTF8String] : "nil");
if (ggmlMetalPathResources) {
sourcePath = [ggmlMetalPathResources stringByAppendingPathComponent:@"ggml-metal.metal"];
} else {
sourcePath = [bundle pathForResource:@"ggml-metal" ofType:@"metal"];
}
if (sourcePath == nil) {
GGML_METAL_LOG_WARN("%s: error: could not use bundle path to find ggml-metal.metal, falling back to trying cwd\n", __func__);
sourcePath = @"ggml-metal.metal";
}
GGML_METAL_LOG_INFO("%s: loading '%s'\n", __func__, [sourcePath UTF8String]);
NSString * src = [NSString stringWithContentsOfFile:sourcePath encoding:NSUTF8StringEncoding error:&error];
if (error) {
GGML_METAL_LOG_ERROR("%s: error: %s\n", __func__, [[error description] UTF8String]);
return NULL;
}
#endif
@autoreleasepool {
// dictionary of preprocessor macros
NSMutableDictionary * prep = [NSMutableDictionary dictionary];
#ifdef GGML_QKK_64
prep[@"GGML_QKK_64"] = @(1);
#endif
MTLCompileOptions* options = [MTLCompileOptions new];
options.preprocessorMacros = prep;
//[options setFastMathEnabled:false];
metal_library = [ctx->device newLibraryWithSource:src options:options error:&error];
if (error) {
GGML_METAL_LOG_ERROR("%s: error: %s\n", __func__, [[error description] UTF8String]);
return NULL;
}
}
}
}
An error is occurred at L347 - metal_library = [ctx->device newLibraryWithSource:src options:options error:&error];
, and the problem is that I have no knowledge of Metal shader...
And even more confusingly, ggml-metal.m
from koboldcpp and from most recent commit d8fd0ccf6ac8b07791ffd1575eed436930854ae3 of llama.cpp are exactly same. :cry:
I changed the Makefile
removing -DGGML_METAL_NDEBUG
and saw following error messages.
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Max
ggml_metal_init: picking default device: Apple M1 Max
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: loading '/Users/******/test/koboldcpp_dev/ggml-metal.metal'
ggml_metal_init: error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:3:10: fatal error: 'ggml-common.h' file not found
#include "ggml-common.h"
^~~~~~~~~~~~~~~
" UserInfo={NSLocalizedDescription=program_source:3:10: fatal error: 'ggml-common.h' file not found
#include "ggml-common.h"
^~~~~~~~~~~~~~~
}
llama_new_context_with_model: failed to initialize Metal backend
gpttype_load_model: error: failed to load model '/Users/******/test/llama.cpp/models/llama-7b-v2/ggml-model-q8_0.gguf'
Load Text Model OK: False
Could not load text model: /Users/******/test/llama.cpp/models/llama-7b-v2/ggml-model-q8_0.gguf
(kdev_env) ******@Mac-Studio-2022-01 koboldcpp_dev %
ggml-common.h
is definitely existed. I feel it is very strange.
Related: ggerganov#5977
As a workround, this works.
xcrun -sdk macosx metal -O3 -c ggml-metal.metal -o ggml-metal.air
xcrun -sdk macosx metallib ggml-metal.air -o default.metallib
Yeah, but it's not ideal.
I might go with the
sed -e '/#include "ggml-common.h"/r ggml-common.h' -e '/#include "ggml-common.h"/d' < ggml-metal.metal > ggml-metal-embed.metal
which is basically sticking the contents of ggml-common.h
directly into the metal shader. I don't really want to precompile the metal lib.
Might need your help to test it again after I tweak it. It's annoying cause I will not be able to test anything myself as I don't have a mac.
I might go with the
sed -e '/#include "ggml-common.h"/r ggml-common.h' -e '/#include "ggml-common.h"/d' < ggml-metal.metal > ggml-metal-embed.metal
which is basically sticking the contents ofggml-common.h
directly into the metal shader. I don't really want to precompile the metal lib.Might need your help to test it again after I tweak it. It's annoying cause I will not be able to test anything myself as I don't have a mac.
After doing the tweak using sed
, I renamed ggml-metal-embed.metal
to ggml-metal.metal
, and run koboldcpp. And the result is:
Load Text Model OK: True
Embedded Kobold Lite loaded.
Starting Kobold API on port 5001 at http://localhost:5001/api/
Starting OpenAI Compatible API on port 5001 at http://localhost:5001/v1/
======
Please connect to custom endpoint at http://localhost:5001
Tada!
I also noticed that ggml-metal.metal
has 2 #include "ggml-common.h"
lines.
#define GGML_COMMON_DECL_METAL
#define GGML_COMMON_IMPL_METAL
#include "ggml-common.h"
#include <metal_stdlib>
#define GGML_COMMON_IMPL_METAL
#include "ggml-common.h"
So, in ggml-metal-embed.metal
, those 2 lines were replaced twice with the exact same content of ggml-common.h
. 😕
Ah yeah, that is fixed in https://github.com/ggerganov/llama.cpp/pull/6015 which I will merge together when fixing the makefile tomorrow. Thanks for helping test.
Hi @beebopkim , if you don't mind, can you see if the latest experimental branch runs fine with LLAMA_METAL=1
for you?
@LostRuins I wish I do it right now but I'm afraid that I can do it after 9 hours... Sorry for your waiting.
No problem, just let me know
@LostRuins With f3b7651102c3ce3e4f331b93137dc32d752eada0, there is no problem. Now I can run with bakllava-mistal-v1 with --gpulayers 99! Thanks alot! 😃
Thanks for testing!
I also confirmed that ec5dea14d74df20059f43301a5dec17023bc03c8 works too. You're welcome!
Commit hash: edb05e761f55e9960d5bf211387138f4c14d1063 Branch: concedo_experimental
With
--gpulayers 80
:Without
--gpulayers 80
:For comparison -
server
with-ngl 999
from llama.cpp commit hash 306d34be7ad19e768975409fc80791a274ea0230: