ggerganov / ggml

Tensor library for machine learning
MIT License
10.85k stars 1k forks source link

Adding custom kernel #105

Open LucasFischer123 opened 1 year ago

LucasFischer123 commented 1 year ago

Hi,

I am new to ggml but what you have build is really good! Thanks a lot for that. I was wondering if you could give me pointers about how to add a custom kernel for the GEMM/Matmul ops in the different LLM !

Thanks a lot.

Lucas

LucasFischer123 commented 1 year ago

To be more specific, I would like to know how I can find the accumulation step in the matmul inside the library, I am struggling to find it !

Thanks a lor

LucasFischer123 commented 1 year ago

Is it here ? I have the AVX2 instruction set Thanks a lot

https://github.com/ggerganov/ggml/blob/master/src/ggml.c#L2952

   // Main loop
    for (int i = 0; i < nb; i++) {
        const __m128 d0 = _mm_set1_ps(GGML_FP16_TO_FP32(x[2*i + 0].d));
        const __m128 d1 = _mm_set1_ps(GGML_FP16_TO_FP32(x[2*i + 1].d));
        const __m256 dx = _mm256_set_m128(d1, d0);

        summs += GGML_FP16_TO_FP32(x[2*i + 0].m) * y[i].s0
               + GGML_FP16_TO_FP32(x[2*i + 1].m) * y[i].s1;

        const __m128i bx0 = bytes_from_nibbles_16(x[2*i + 0].qs);
        const __m128i bx1 = bytes_from_nibbles_16(x[2*i + 1].qs);
        const __m256i bx = _mm256_set_m128i(bx1, bx0);

        const __m256 dy = _mm256_broadcast_ss(&y[i].d);
        const __m256i by = _mm256_loadu_si256((const __m256i *)y[i].qs);

        const __m256 q = mul_sum_i8_pairs_float(bx, by);

        acc = _mm256_fmadd_ps(q, _mm256_mul_ps(dx, dy), acc);
    }

    *s = hsum_float_8(acc) + summs;