attractivechaos / kann

A lightweight C library for artificial neural networks
Other
678 stars 117 forks source link

How to extract outputs from a hidden layer of autoencoder? #37

Closed lixiangchun closed 3 years ago

7PintsOfCherryGarcia commented 3 years ago

Still studying the code base, but I can think of two ways of doing this. One is more "hacky" than the other and not sure if it would be the best way to do it. I will leave some code example of the "hacky way" because I am still not sure how to implement the second one.

KANN models consist of:

typedef struct {
    int n;            /* number of nodes in the computational graph */
    kad_node_t **v;   /* list of nodes */
    float *x, *g, *c; /* collated variable values, gradients and constant values */
    void *mt;         /* auxiliary data for multi-threading; NULL if multi-threading disabled */
} kann_t;

Different from the "layer" structure other API's like KERAS use, KANN is based on the computational graph of a NN. The nodes of this graph are stored in "kad_node_t **v". To extract the outputs of a hidden layer in an AE (or any other layer in any given model), You would need to compute the forward pass over the graph up to the desired node which outputs the layer you are looking for. The way I understand the code, you can do this in two ways.

  1. Train your AE, then, build a new model, up the the node you are interested using the same wights of the trained model. Then you can simply apply this new model using:

    const float *kann_apply1(kann_t *a, float *x)
  2. Directly use the train model to compute the forward pass up to the node that outputs the results of the hidden layer of interest. To do this you have to find that node, and use the lower level:

    const float *kad_eval_at(int n, kad_node_t **a, int from)

to compute the forward pass.

Fortunately nodes can be identified by their:

    uint32_t    ext_flag; // From typedef struct kad_node_t { ... }

These flags are used to identify input (KANN_F_IN 0x1) output (KANN_F_OUT 0x2) truth (KANN_F_TRUTH 0x4) and cost (KANN_F_COST 0x8) by setting the lower 4 bits in ext_flag. This leaves us with 28 bits for identifying nodes. So you can have your own identifier, say:

KANN_F_CODE 0x10  //Bit 5

So then you when building your AE model, you can flag the node that computes the CODE:

//Code from the denoising AEin exammples
kann_t *dae_model(int n_in, int n_hidden, float i_dropout) {
    kad_node_t *x, *t, *w, *r;
    w = kann_new_weight(n_hidden, n_in);
    r = kann_new_scalar(KAD_VAR, sqrtf((float)n_in / n_hidden));

    x = kad_feed(2, 1, n_in), x->ext_flag |= KANN_F_IN | KANN_F_TRUTH;
    t = kann_layer_dropout(x, i_dropout);

    t = kad_tanh(kad_add(kad_cmul(t, w), kann_new_bias(n_hidden)));
    t = kad_mul(t, r);

    //******************************
    t->ext_flag |= KANN_F_CODE; //This should be the node that computes the CODE layer of size n_hidden
    //******************************

    t = kad_add(kad_matmul(t, w), kann_new_bias(n_in));
    t = kad_sigm(t), t->ext_flag = KANN_F_OUT;
    t = kad_ce_bin(t, x), t->ext_flag = KANN_F_COST;
    return kann_new(t, 0);
}

Once you have a trained model, you can extract the values of the code layer with:

//Get index of the node that we have set KANN_F_CODE
int code_out;
code_out = kann_find(ann, KANN_F_CODE, 0);
//Set model to evaluate a single input
kann_set_batch_size(ann, 1);
//Bind the nth datapoint to the input layer??? Not sure if this is what this function does
kann_feed_bind(ann, KANN_F_IN, 0, &in->x[n]);
//The output of the computation up to the desired node is stored in its x member 
float *res;
res = ann->v[code_out]->x;
//Evaluate up to the code_out node
kad_eval_at(ann->n, ann->v, code_out);

The coded input will be in res, which will have a length of ann->v[code_out]->d[ ann-v[ code_out->n_d - 1 ] ]

ann->v[code_out]->n_d //Number of dimensions of node v[code_out]
ann->v[code_out]->d[n]  //Length of dimension n of node v[code_out]
attractivechaos commented 3 years ago

Thanks, @7PintsOfCherryGarcia. Yes, your solution works. You may also use kad_node_t::ext_label. You can set a non-zero label (0 means unlabeled) at a node of interest:

t->ext_label = 1;

and then retrieve the node with

int i = kann_find(ann, 0, 1);

If the label is unique, ann->v[i] will point to the node in the computation graph. I should probably provide a more general version of kann_apply1():

const float *kann_apply1_to(kann_t *a, float *x, int ext_flag, int ext_label)
{
  int i_out;
  i_out = kann_find(a, ext_flag, ext_label);
  if (i_out < 0) return 0; /* not found, or multiple matching nodes */
  kann_set_batch_size(a, 1);
  kann_feed_bind(a, KANN_F_IN, 0, &x);
  kad_eval_at(a->n, a->v, i_out);
  return a->v[i_out]->x;
}
const float *kann_apply1(kann_t *a, float *x) /* then kann_apply1() is a special case of kann_apply1_to() */
{
  return kann_apply1_to(a, x, KANN_F_OUT, 0);
}
attractivechaos commented 3 years ago

Added the kann_apply1_to() API.

attractivechaos commented 3 years ago

One more comment on the structure of kann:

Different from the "layer" structure other API's like KERAS use, KANN is based on the computational graph of a NN.

kautodiff.c is equivalent to TensorFlow, although much less powerful of course. It implements a simpler version of computation graph. kann.c is equivalent to Keras. It provides layers. The autoencoder example doesn't use layer APIs because the example ties the weight matrix (i.e. the second weight matrix is the transpose of the first matrix), which can't be done with the layer APIs at least not with kann. The MLP example and the CNN example use layers and are simpler because weights are not exposed to the user code.