BlueBrain / nmodl

Code Generation Framework For NEURON MODeling Language
https://bluebrain.github.io/nmodl/
Apache License 2.0
55 stars 15 forks source link

Strategy for unit testing compute kernels generated from LLVM backend #540

Open pramodk opened 3 years ago

pramodk commented 3 years ago

Assume sample mod file like this:

$ cat hh.mod
TITLE hh.mod   squid sodium, potassium, and leak channels

UNITS {
    (mA) = (milliamp)
    (mV) = (millivolt)
    (S) = (siemens)
}

NEURON {
    SUFFIX hh
    USEION na READ ena WRITE ina
    USEION k READ ek WRITE ik
    NONSPECIFIC_CURRENT il
    RANGE gnabar, gkbar, gl, el, gna, gk
    RANGE minf, hinf, ninf, mtau, htau, ntau
    THREADSAFE : assigned GLOBALs will be per thread
}

PARAMETER {
    gnabar = .12 (S/cm2)    <0,1e9>
    gkbar = .036 (S/cm2)    <0,1e9>
    gl = .0003 (S/cm2)    <0,1e9>
    el = -54.3 (mV)
}

STATE {
    m h n
}

ASSIGNED {
    v (mV)
    celsius (degC)
    ena (mV)
    ek (mV)
    gna (S/cm2)
    gk (S/cm2)
    ina (mA/cm2)
    ik (mA/cm2)
    il (mA/cm2)
    minf hinf ninf
    mtau (ms) htau (ms) ntau (ms)
}

BREAKPOINT {
    SOLVE states METHOD cnexp
    gna = gnabar*m*m*m*h
    ina = gna*(v - ena)
    gk = gkbar*n*n*n*n
    ik = gk*(v - ek)
    il = gl*(v - el)
}

DERIVATIVE states {
     m' =  (minf-m)/mtau
     h' = (hinf-h)/htau
     n' = (ninf-n)/ntau
}

The struct for holding all data is generated looks like this:

INSTANCE_STRUCT {
    DOUBLE *gnabar
    DOUBLE *gkbar
    DOUBLE *gl
    DOUBLE *el
    DOUBLE *gna
    DOUBLE *gk
    DOUBLE *il
    DOUBLE *minf
    DOUBLE *hinf
    DOUBLE *ninf
    DOUBLE *mtau
    DOUBLE *htau
    DOUBLE *ntau
    DOUBLE *m
    DOUBLE *h
    DOUBLE *n
    DOUBLE *Dm
    DOUBLE *Dh
    DOUBLE *Dn
    DOUBLE *ena
    DOUBLE *ek
    DOUBLE *ina
    DOUBLE *ik
    DOUBLE *v_unused
    DOUBLE *g_unused
    DOUBLE *ion_ena
    DOUBLE *ion_ina
    DOUBLE *ion_dinadv
    DOUBLE *ion_ek
    DOUBLE *ion_ik
    DOUBLE *ion_dikdv
    INTEGER *ion_ena_index
    INTEGER *ion_ina_index
    INTEGER *ion_dinadv_index
    INTEGER *ion_ek_index
    INTEGER *ion_ik_index
    INTEGER *ion_dikdv_index
    DOUBLE *voltage
    INTEGER *node_index
    DOUBLE t
    DOUBLE dt
    DOUBLE celsius
    INTEGER secondorder
    INTEGER node_count
}

And compute function generated looks like:

VOID nrn_state_hh(INSTANCE_STRUCT *mech){
    INTEGER id
    for(id = 0; id<mech->node_count; id = id+1) {
        INTEGER node_id, ena_id, ek_id
        DOUBLE v
        node_id = mech->node_index[id]
        ena_id = mech->ion_ena_index[id]
        ek_id = mech->ion_ek_index[id]
        v = mech->voltage[node_id]
        mech->ena[id] = mech->ion_ena[ena_id]
        mech->ek[id] = mech->ion_ek[ek_id]
        mech->m[id] = mech->m[id]+(1.0-exp(mech->dt*((((-1.0)))/mech->mtau[id])))*(-(((mech->minf[id]))/mech->mtau[id])/((((-1.0)))/mech->mtau[id])-mech->m[id])
        mech->h[id] = mech->h[id]+(1.0-exp(mech->dt*((((-1.0)))/mech->htau[id])))*(-(((mech->hinf[id]))/mech->htau[id])/((((-1.0)))/mech->htau[id])-mech->h[id])
        mech->n[id] = mech->n[id]+(1.0-exp(mech->dt*((((-1.0)))/mech->ntau[id])))*(-(((mech->ninf[id]))/mech->ntau[id])/((((-1.0)))/mech->ntau[id])-mech->n[id])
    }
}

This compute kernel generated in-memory and translated to LLVM IR. Our goal is to:

What needs to happen?

As kernels and INSTANCE_STRUCT are generated dynamically, how to do such testing?

pramodk commented 3 years ago

Copying @georgemitenkov's comments from https://github.com/BlueBrain/nmodl/pull/533#issuecomment-791897403 :

---- start ----

@pramodk Regarding testing, I had one idea:

1) Generate llvm::Module using the pipeline 2) Add a new file test_llvm_kernels.cpp or something like that. In that file, we create Instance struct artificially, and write cpp wrappers to print contents before/after the kernel execution. 3)Link the wrapper llvm::Module with our llvm::Module (For my GSoC I was using a similar strategy actually, so I have an idea of how this is done with LLVM API). 4) Simply feed this into llvm_nmodl_runner and see what are the outputs :)

This is not actual IR check but suits integration test purposes.

For example something like this:

#include <stdio.h>

// ================= LLVM kernel generated from the pipeline ======================== //

struct Bar {
  int* __restrict__ indices;
  double* __restrict__ voltage;
  int num_nodes;
};

void kernel(Bar* b) {
  double v = -1.0;
  b->voltage[b->indices[0]] = v * b->voltage[b->indices[0]];
  b->voltage[b->indices[1]] = v * b->voltage[b->indices[1]];
}

// ================= Helpers that would come from wrapper class ==================== //

void print_struct(Bar *b) {
  printf("num nodes: %d\n", b->num_nodes);
  printf("indices: ");
  for (int i = 0; i < b->num_nodes; ++i) {
    printf("%d", b->indices[i]);
    if (i < b->num_nodes - 1) printf(", "); else printf("\n");
  }
  printf("voltage: ");
  for (int i = 0; i < b->num_nodes; ++i) {
    printf("%.2f", b->voltage[i]);
    if (i < b->num_nodes - 1) printf(", "); else printf("\n");
  }
}

int main() {
  Bar b;
  b.num_nodes = 2;
  int indices[] = {0, 1};
  double voltage[] = {5.0, 10.0};
  b.indices = indices;
  b.voltage = voltage;
  printf(" == Before == ");
  print_struct(&b);
  kernel(&b);
  printf(" == After == ");
  print_struct(&b);
  return 0;
}

I am currently using this to verify the vectorised code.

---- end ----

pramodk commented 3 years ago

Add a new file test_llvm_kernels.cpp or something like that. In that file, we create Instance struct artificially, and write cpp wrappers to print contents before/after the kernel execution.

I was thinking in similar direction! Before writing details about what I was thinking, let me clarify few questions regarding your proposal:

Considering above questions, I was thinking of following:

As shown above, one has just to take care of alignment/padding aspect i.e. we have to pin pointers or non-pointer variables at particular offset.

Does this make sense?

The reason I am thinking with above approach is that 1) we don't know the type of INSTANCE_STRUCT_FOO at compile time 2) this approach could be used for non-LLVM backends as well.

Implementing above wouldn't be complicated : allocating some memory block and setup pointers at particular offset considering alignment. But with LLVM API, if you think it would be even more easier, then feel free to propose!

cc: @iomaganaris

Edit : may be I can provide a pseudo code later today and that might help to explain my text.

pramodk commented 3 years ago

Here is very abstract code for above logic:


 SCENARIO("compute kernel test", "[llvm][runner]") {
     GIVEN("mod file ") {

         std::string nmodl_text = R"(

                 NEURON {
                     SUFFIX hh
                     USEION na READ ena WRITE ina
                     USEION k READ ek WRITE ik
                     NONSPECIFIC_CURRENT il
                     RANGE gnabar, gkbar, gl, el, gna, gk
                     RANGE minf, hinf, ninf, mtau, htau, ntau
                     THREADSAFE : assigned GLOBALs will be per thread
                 }
                 ...
                 DERIVATIVE states {
                      m' =  (minf-m)/mtau
                      h' = (hinf-h)/htau
                      n' = (ninf-n)/ntau
                 }

         )";

         NmodlDriver driver;
         const auto& ast = driver.parse_string(nmodl_text);
         ...

         codegen::CodegenLLVMHelperVisitor v(.....);
         v.visit_program(*ast);
         ...

         // we now retrieve information about how many double*, int*, double and int in the structure
         auto& some_instance_struct_info = v.get_some_useful_instance_struct_info();
         ..

         // here we allocate instruct struct object object with same seed and hence data1 and data2 are same
         // `allocate_and_initialize_instruct_struct` will allocate base struct and will setup pointers to actual data
         // note the data is just `void*` which can be type cast to actual type inside JIT runner
         void* data1 = allocate_and_initialize_instruct_struct(some_instance_struct_info, SEED1, NUM_NODE_COUNT);
         void* data2 = allocate_and_initialize_instruct_struct(some_instance_struct_info, SEED1, NUM_NODE_COUNT);
         ...

         // based on the backends, we now can run kernels with different backends / vector width
         Runner your_runner1(m, data1, vector_width=1);
         Runner your_runner2(m, data2, vector_width=4);
         Runner your_runner3(m, data3, gpu=true);         
          ...

         // compare the results or print them if required
         compare_data_with_some_condition(data1, data2);
         compare_data_with_some_condition(data1, data3);
          ...

          // cleanup 
         deallocate_instruct_struct(data1);
         deallocate_instruct_struct(data2);
georgemitenkov commented 3 years ago

Right, I see! We indeed do not know what the InstanceStruct would be until we parse the AST. This seems logical approach to me and I think it is good because of the uniformity over all backends. I will look more into this as well.

Also:

georgemitenkov commented 3 years ago

After some more thinking, I had another idea that helps in my opinion (not really specific to anything:) ) I think that I also better understand this approach, so we can have a sync later on the weekend/Monday to discuss more.

We define a c++ class for all inputs:

struct InstanceInfo {
  basePtr;
  offsetsPtr;
  sizesPtr;
  num_elems;
};

Then InstanceInfo can be transformed to LLVM struct! We can have something like:

// We will call these functions in our LLVM wrapper file

extern "C" void _interface_init_struct(InstanceInfo *info) {
  // C code to set up the fields;
}

extern "C" void _interface_print_struct(InstanceInfo *info) {
  // C code to print struct;
}

The only thing that is left, is to define a conversion to our struct. We can define:

// here info contains the data, type is taken from AST or LLVM generated kernel code.
llvm::Value* infoToStruct(InstanceInfo *info, llvm::Type * instanceType) {
  // code that produces instructions to transform Info to the struct we need: basically
 // iterate  over `num_elems` and get the members with the size/offset calculation.
}

Overall, we generate LLVM module following steps:

  1. Create LLVM main function
  2. Fill InstanceInfo in some defined way (command line, predefined functions using extern C, etc.)
  3. Emit a code to convert to our struct and fill the data using infoToStruct()
  4. Create call void @kernel(%our_struct_type *s)
  5. Create call _interface_print_struct
pramodk commented 3 years ago

When using Runner your_runner1(m, data1, vector_width=1); do you mean that the backend is the backend of the LLVM pipeline?

Yes, I was thinking LLVM backends with AVX2, AVX512 or ARM NEON. (but same data structure could be used for testing non-LLVM based backends but that would need additional work for runners)

What would be the logic allocate_and_initialize_instruct_struct? As I understand, we want num_pointers * 8 + sizeof(whatever is left) allocated.

That's correct. Just a note : one needs to bit careful about the size of struct due to padding / alignment. We have simple struct with double, int, double and int. So it's not that complicated but something to keep in mind.

But how the actual test data would be provided?

char *instance = allocate(whatever size required for pointers + extra data + padding/alignment );

// 1st member data
*(instance + 0) = allocate (sizeof(double) * node_count);

// 2nd member data (8 considering pointer size)
*(instance + 8) = allocate (sizeof(double) * node_count);

... similar offset calculation for rest of the data members and their data allocation

// double and int variables are directly stored as values
*(instance + X) = 0.025 // dt
pramodk commented 3 years ago

After some more thinking, I had another idea that helps in my opinion (not really specific to anything:) ) I think that I also better understand this approach, so we can have a sync later on the weekend/Monday to discuss more.

The only thing that is left, is to define a conversion to our struct. We can define: // code that produces instructions to transform Info to the struct we need: basically // iterate over num_elems and get the members with the size/offset calculation.

Yeah, I think discussion would be helpful. I was thinking about padding/alignment aspects to avoid this transformation. i.e. if you create a memory block with right pointers then you can directly typecast the pointer to instanceType. May be discussion would clarify things!

georgemitenkov commented 3 years ago

char instance = allocate(whatever size required for pointers + extra data + padding/alignment ); // 1st member data (instance + 0) = allocate (sizeof(double) node_count); // 2nd member data (8 considering pointer size) (instance + 8) = allocate (sizeof(double) node_count); ... similar offset calculation for rest of the data members and their data allocation // double and int variables are directly stored as values (instance + X) = 0.025 // dt

I see, thank you for the example!