ggerganov / ggml

Tensor library for machine learning
MIT License
11.26k stars 1.05k forks source link

How to use ggml_conv_1d? #883

Closed chavinlo closed 4 months ago

chavinlo commented 4 months ago

Hello, I am trying to implement a model that makes uses of nn.conv1d in pytorch. I don't have much experience with C++ but I've read the MNIST examples and part of stable-diffusion.cpp. However, I can't seem to find many examples of ggml_conv_1d.

I have tried this (the following is just simplified):

// ...
ggml_tensor * input = ggml_get_tensor(ggml_loader_ctx, "input"); // Has a "ne" (that's shape, right?) of {1, 1, 297328, 1}
ggml_tensor * weight = ggml_get_tensor(ggml_loader_ctx, "weight") // Has a "ne" of {10, 1, 512, 1} - {kernel_size, input_dim, output_dim, ???}

int stride = 5;
int padding = 0;
int dilation = 1;

ggml_tensor* output = ggml_conv_1d(ctx, weight_tensor, input, stride, padding, dilation);

And this would be the pytorch equivalent:

import torch
import torch.nn as nn
import torch.nn.functional as F

conv0 = nn.Conv1d(1, 512, 10, 5, bias=False) # This is just the declaration but the weights are imported from an already trained model
input = torch.load("inp.pt") # Shape: 1, 1, 297328
out = conv0(input) # Shape: 1, 512, 59464

I got the weight shape order from this code: https://github.com/PABannier/encodec.cpp/blob/main/encodec.cpp#L755 However, the output is just a bunch of NaNs (I think, because its all "Ï" on visual studio memory debug). I tried diving deeper and I think the issue is on ggml_im2col where const int64_t OW = ggml_calc_conv_output_size(b->ne[0], a->ne[0], s0, p0, d0); calculates a 0 value (which is wrong?). While debugging I think this is what it was calculating: ((1 + 2 × 1 − 1 × (10 − 1) − 1) / 5 + 1) which equals -0.4?

I've tried finding a solution but theres barely any documentation and I have already stared at the source code for hours and don't know what to do

This is the code I use to save the weights into GGUF format if necessary:

gguf_writer = gguf.GGUFWriter("test.gguf", "test0")

x = torch.load("inp.pt")
z = z.numpy()
z = z.transpose(2, 1, 0)

model = model.requires_grad_(False)
conv0_weight = model.conv0.weight.numpy()

gguf_writer.add_tensor("model0.test0.weight", conv0_weight, conv0_weight.shape)
gguf_writer.add_tensor("input", z, z.shape)
gguf_writer.write_header_to_file()
gguf_writer.write_kv_data_to_file()
gguf_writer.write_tensors_to_file()
gguf_writer.close()
balisujohn commented 4 months ago

at first glance {1, 1, 297328, 1} should probably be {297328,1,1,1}

There are some examples of ggml_conv_1d in tortoise.cpp https://github.com/balisujohn/tortoise.cpp/blob/b9bb8771c3e3fa8ccb615d24b59b42edf15ca2fa/main.cpp#L3624

chavinlo commented 4 months ago

at first glance {1, 1, 297328, 1} should probably be {297328,1,1,1}

There are some examples of ggml_conv_1d in tortoise.cpp https://github.com/balisujohn/tortoise.cpp/blob/b9bb8771c3e3fa8ccb615d24b59b42edf15ca2fa/main.cpp#L3624

Thanks, now the im2col tensor has a shape of {10, 59464, 1, 1} which is closer to pytorch's output. But it then returns a empty (full of -431602080.) on the ggml_mul_mat call (L6472). What exactly is the first tensor of mul_mat supposed to be? because im2col initializes a tensor with the shapes mentioned before but no value inside of it. And the a and b tensors are only stored on result->src[0] = a; and result->src[1] = b; respectively, right? (L6607)

tldr tensor a (what's returned from im2col) of mul_mat's arguments inside ggml_conv_1d is empty hence the output of conv1d will fail

balisujohn commented 4 months ago

whats the output shape of ggml_conv_1d with these arguments?

chavinlo commented 4 months ago

whats the output shape of ggml_conv_1d with these arguments?

The output shape of ggml_conv_1d is {59464, 512, 1, 1}. However, the data of the tensor is empty.

balisujohn commented 4 months ago

It would be helpful if you can produce a reproducible example of the error; you should be able to fork ggml and modify https://github.com/ggerganov/ggml/blob/master/examples/simple/simple-backend.cpp so it does ggml_conv_1d on your inputs instead of the current operation.

chavinlo commented 4 months ago

It would be helpful if you can produce a reproducible example of the error; you should be able to fork ggml and modify https://github.com/ggerganov/ggml/blob/master/examples/simple/simple-backend.cpp so it does ggml_conv_1d on your inputs instead of the current operation.

Sure, heres the tensors.pt file needed for the scripts below (it only contains the weight of a conv1d and a test input): https://huggingface.co/chavinlo/ggmltest/resolve/main/tensors.pt?download=true

Pytorch and save to GGUF:

import torch
import torch.nn as nn
import gguf

tensors = torch.load("tensors.pt")

conv = nn.Conv1d(1, 512, 10, 5, bias=False)

input_tensor = tensors["input"]
conv.weight = tensors["weight"]

x = conv(input_tensor)
print(x)
print(x.shape)
print(input_tensor.shape)
print(conv.weight.shape)

"""
Output should be:
tensor([[[-3.6160e-02, -2.8281e-02,  1.1107e-02,  ...,  3.0684e-02,
          -2.2186e-02, -6.3259e-03],
         [ 7.9663e-02,  3.0687e-02, -5.0905e-02,  ..., -1.9161e-02,
           3.1029e-02,  1.5162e-02],
         [ 2.3205e-01,  1.8175e-01, -1.0812e-01,  ..., -2.3548e-02,
           3.9255e-02,  1.1151e-01],
         ...,
         [ 8.6802e-04,  8.3316e-04,  3.2947e-04,  ..., -2.9786e-03,
           6.5938e-03,  1.1510e-02],
         [ 1.6648e-02,  2.3425e-02, -7.5188e-03,  ...,  8.7883e-03,
           4.2063e-03,  1.8971e-02],
         [-2.4058e-01,  3.3975e-01,  2.8910e-01,  ..., -1.3100e-01,
          -1.3514e-01,  1.4614e-01]]])
torch.Size([1, 512, 59464])
torch.Size([1, 1, 297328])
torch.Size([512, 1, 10])
"""

gguf_writer = gguf.GGUFWriter("tensors.gguf", "test0")
gguf_writer.add_tensor("weight", conv.weight.numpy(), conv.weight.numpy().shape)
gguf_writer.add_tensor("input", input_tensor.numpy(), input_tensor.numpy().shape)
gguf_writer.write_header_to_file()
gguf_writer.write_kv_data_to_file()
gguf_writer.write_tensors_to_file()
gguf_writer.close()

GGML C++:

#include <iostream>
#include "ggml.h"

int main()
{
    std::string fname = "tensors.gguf";

    // ##### GGML Context #####
    static size_t buf_size = 1024 * 1024 * 128;
    static void* buf = malloc(buf_size);

    struct ggml_init_params ggml_params = {
        /*.mem_size   =*/ buf_size,
        /*.mem_buffer =*/ buf,
        /*.no_alloc   =*/ false,
    };

    struct ggml_context* ggml_ctx = ggml_init(ggml_params);
    // %%%%%%%%%%

    // ##### GGUF Model Loading Context #####
    struct ggml_context* ggml_loader_ctx;

    struct gguf_init_params gguf_params = {
        /*.no_alloc   =*/ false,
        /*.ctx        =*/ & ggml_loader_ctx,
    };

    gguf_context* gguf_ctx = gguf_init_from_file(fname.c_str(), gguf_params);
    // %%%%%%%%%%

    ggml_tensor* weight_tensor = ggml_get_tensor(ggml_loader_ctx, "weight");
    ggml_tensor* input_tensor = ggml_get_tensor(ggml_loader_ctx, "input");

    ggml_tensor* output = ggml_conv_1d(ggml_ctx, weight_tensor, input_tensor, 5, 0, 1);

    std::cout << &output;
}

I had to create two contexts because when calling gguf_init_from_file the buf_size would shrink to 1mb. The C++ code returns the same results as the script I was using before. Same output shape {59464, 512, 1, 1} yet no data.

Also about the ggml forking... you mean implementing ggml_conv_1d myself or...?

balisujohn commented 4 months ago

here's an example of what I'm talking about: https://github.com/balisujohn/ggml-get-rows-error/commit/acc02592ac953a41ea0f7e979e06b87f725f231d when I created a reproducible example an error with ggml_get_rows.You can also upload the .pt file and .gguf files into the repository to make reproducing the error easier.

balisujohn commented 4 months ago

I didn't see you had included the tensors.pt (I'll see if I can reproduce the error)

balisujohn commented 4 months ago

I am not an expert in ggml runtime variants, but I find it extremely suspicious that you are not calling either ggml_build_forward_expand or any variant of ggml_graph_compute anywhere. It looks like you are just declaring the computational graph but not actually activating the computation.

chavinlo commented 4 months ago

I am not an expert in ggml runtime variants, but I find it extremely suspicious that you are not calling either ggml_build_forward_expand or any variant of ggml_graph_compute anywhere. It looks like you are just declaring the computational graph but not actually activating the computation.

Added ggml_build_forward_expand and ggml_graph_compute_with_ctx and got an error on the ggml_compute_forward_im2col_f16 function (L14383) in the GGML_ASSERT(src0->type == GGML_TYPE_F16); (L14390) check. So then I changed the weight type from float32 to float16 before saving to GGUF, reloaded it, and finally it returned a non-empty tensor. Does this mean that the weights have to always be in float16 precision? because even the ggml_compute_forward_im2col_f32 function (L14306) makes the same check for the weights to be in float16.

Not that big of an error, but the accuracy of the output tensor is a little bit off: GGML output's first value is -0.0361655578, while torch gives -0.0361604653. The shape is correct though, {59464, 512, 1, 1}.

Heres the updated C++/Pytorch code if necessary:

#include <iostream>
#include "ggml.h"

int main()
{
    std::string fname = "tensors.gguf";

    // ##### GGML Context #####
    static size_t buf_size = 1024 * 1024 * 128;
    static void* buf = malloc(buf_size);

    struct ggml_init_params ggml_params = {
        /*.mem_size   =*/ buf_size,
        /*.mem_buffer =*/ buf,
        /*.no_alloc   =*/ false,
    };

    struct ggml_context* ggml_ctx = ggml_init(ggml_params);
    // %%%%% End of Inference GGML Context %%%%%

    // ##### GGUF Model Loading Context #####
    struct ggml_context* ggml_loader_ctx;

    struct gguf_init_params gguf_params = {
        /*.no_alloc   =*/ false,
        /*.ctx        =*/ & ggml_loader_ctx,
    };

    gguf_context* gguf_ctx = gguf_init_from_file(fname.c_str(), gguf_params);
    // %%%%% End of GGUF Model Loading Context %%%%%

    struct ggml_cgraph* gf = ggml_new_graph(ggml_ctx);

    ggml_tensor* weight_tensor = ggml_get_tensor(ggml_loader_ctx, "weight");
    ggml_tensor* input_tensor = ggml_get_tensor(ggml_loader_ctx, "input");

    ggml_tensor* output = ggml_conv_1d(ggml_ctx, weight_tensor, input_tensor, 5, 0, 1);

    ggml_build_forward_expand(gf, output);
    ggml_graph_compute_with_ctx(ggml_ctx, gf, 1);

    std::cout << &output->data;
}
import torch
import torch.nn as nn
import gguf

tensors = torch.load("tensors.pt")
conv = nn.Conv1d(1, 512, 10, 5, bias=False)

input_tensor = tensors["input"]
conv.weight = tensors["weight"]

x = conv(input_tensor)
# to save
conv = conv.to(torch.float16)
print(x)
print(input_tensor)
print(conv.weight)
print("output shape:", x.shape)
print("input shape:", input_tensor.shape)
print("weight shape:", conv.weight.shape)
print("input dtype:", input_tensor.dtype)
print("weight dtype:", conv.weight.dtype)

"""
Output should be:

tensor([[[-3.6160e-02, -2.8281e-02,  1.1107e-02,  ...,  3.0684e-02,
          -2.2186e-02, -6.3259e-03],
         [ 7.9663e-02,  3.0687e-02, -5.0905e-02,  ..., -1.9161e-02,
           3.1029e-02,  1.5162e-02],
         [ 2.3205e-01,  1.8175e-01, -1.0812e-01,  ..., -2.3548e-02,
           3.9255e-02,  1.1151e-01],
         ...,
         [ 8.6802e-04,  8.3316e-04,  3.2947e-04,  ..., -2.9786e-03,
           6.5938e-03,  1.1510e-02],
         [ 1.6648e-02,  2.3425e-02, -7.5188e-03,  ...,  8.7883e-03,
           4.2063e-03,  1.8971e-02],
         [-2.4058e-01,  3.3975e-01,  2.8910e-01,  ..., -1.3100e-01,
          -1.3514e-01,  1.4614e-01]]])
tensor([[[ 0.3720,  0.3385,  0.2953,  ..., -0.0872, -0.1079, -0.1487]]])
Parameter containing:
tensor([[[-0.0186,  0.2178, -0.1289,  ..., -0.0457,  0.1654,  0.1256]],
        [[-0.1410,  0.2072,  0.1740,  ..., -0.0887, -0.0700, -0.0103]],
        [[ 0.2450,  0.2239,  0.1588,  ..., -0.1426, -0.1188,  0.1205]],
        ...,
        [[ 0.0122, -0.0559,  0.1382,  ..., -0.2484,  0.1337, -0.0367]],
        [[ 0.1155,  0.0986, -0.0650,  ..., -0.1870, -0.0693,  0.0563]],
        [[-0.1095, -0.1289, -0.2644,  ..., -0.2622, -0.0640, -0.0243]]],
       dtype=torch.float16)
output shape: torch.Size([1, 512, 59464])
input shape: torch.Size([1, 1, 297328])
weight shape: torch.Size([512, 1, 10])
input dtype: torch.float32
weight dtype: torch.float16
"""

gguf_writer = gguf.GGUFWriter("tensors.gguf", "test0")
gguf_writer.add_tensor("weight", conv.weight.numpy(), conv.weight.numpy().shape)
gguf_writer.add_tensor("input", input_tensor.numpy(), input_tensor.numpy().shape)
gguf_writer.write_header_to_file()
gguf_writer.write_kv_data_to_file()
gguf_writer.write_tensors_to_file()
gguf_writer.close()

Also, thanks for your help, I really appreciate it.

balisujohn commented 4 months ago

The weights do always have to be float16, the slight numerical difference from torch is unsurprising, and no problem, happy to help.