Closed cmp-nct closed 1 year ago
Good idea
Would love to take this.
Lmk if looks good @cmp-nct https://github.com/ggerganov/llama.cpp/pull/1692
Good idea
Apologies - I was wrong
The better approach to support such type of functionality is through the new extra
member of ggml_tensor
:
https://github.com/ggerganov/llama.cpp/blob/98ed16557432d7a5179c57eddcc3a08a7ae6d54d/ggml.h#L390
The reason is to keep ggml
interface general-purpose.
The "layer" concept does not always make sense - i.e. it is specific for particular architectures. Therefore cannot be part of the interface.
You recall my "meta" recommendation a month or two ago? That's basically what "extra" appears to be. Though as a void pointer that would not be supported by IDE autocompletion, right ?
Wouldn't a simple struct be the better solution than a void pointer ? It's not as "flexible" but unused elements can just stay undefined/default.
Regarding the layer: I originally recommended using layer -1 or 0 for "no layer", so for tensors that are not layer specific that's a way to specify it.
I personally use something like this, it's not 100% up to latest ggml commits since falcon I shifted my focus but basically that should work (just extra instead of meta):
typedef struct
{
uint8_t layer; // 0 = not set, 1+ = layer num
char short_name[32]; // parameter weight name without layer
int8_t flag_high_performance; // if this tensor is high performance (relevant for hybrid core systems like Intel 12/13 gen)
int8_t flag_use_blas_cuda; // dst or src0 tensor, -1 = default, 0 = off, 1=on (if contigous)
int8_t caching_priority; // -1 no caching, 0 = default, 1+ = priority | caching is by layer number and layer name (0 = global layer)
} tensor_meta;
// default tensor meta:
static const tensor_meta GGML_DEFAULT_TENSOR_META = {
/*.layer =*/ 0,
/*.short_name =*/ "",
/*.flag_high_performance =*/ 1,
/*.flag_use_blas_cuda =*/ -1, // -1 = use default
/*.caching_priority =*/ -1, // no caching
};
1) ggml tensors need a layer number parameter I'd use layer 0 for global and 1+ (could also be -1 and 0+ of course) 2) When a ggml tensor is created the latest configured layer is used, default is 0
ggml_set_current_layer(il+1);
This way it's only a single line of code in the eval loop to set the layer of each node.3) The model currently contains the name intermixed with the layer like "layers.58.attention.wo.weight" This also should be changed "attention.wo.weight" and layer number set to 59
Benefits: 1) debug output will be more clean and informative, the layer of each calculation is part of the graph print now (without adding it hardcoded into the generic name) 2) optimizations can be applied by layer name or weight name in a clean way