Guidance on implementing a softmax layer

Hi again, and thanks once more!

I've become more familiar with the package, and have started to use it to solve a real problem! I am training a multi-layer perceptron to predict multiple class labels (e.g., akin to a multinomial regression).

I am stuck now wanting to create an 'additional part', but I am unable to see how I can modify any of the existing templates from additional_parts.h (whose examples focus on pes and connections).

The part I desire is something that will perform a softmax transform across all pes within a layer. Reading over the documentation and codebase, I think this could be implemented as a custom layer, such as one modelled after the BP layer. If implemented, I imagine I could use the "BP-softmax" layer like:

a <- new( "NN" ) # create a NN object in variable a
a$add_layer( "generic", 4 ) # 1. a layer of 4 generic nodes
a$add_connection_set( "BP" ) # 2. a set of BP connections
a$add_layer( "BP-hidden", 3 ) # 3. a layer of 3 BP pes
a$add_connection_set( "BP" ) # 4. another set of BP connections
a$add_layer( "BP-softmax", 3 )
a$create_connections_in_sets ( 0, 1 ) # Populate sets with actual connections

where the final layer is a softmax transform of the previous output. I think this would have to be a distinct layer, and not a PE activation function, since the softmax calculation would need to be computed over a set of pes, not for a single pe.

I am raising an issue because I am a bit stuck on how best to approach creating a custom layer like this. Looking over the nn_bp.cpp file, I thought I could create a new class layer like bp_comput_layer, that could softmax transform during recall. I suppose the operation might look something like (adapted from bp_comput_layer):

void bp_softmax_layer::recall()
{
  if(no_error())
  {
    denom=0;
    for(int i=0;i<size();i++)
    {
      pe REF p = pes[i];
      DATA x = p.input; // input is already summated;
      x = x + p.bias; // add bias,
      denom=denom+exp(x);
    }

    for(int i=0;i<size();i++)
    {
      pe REF p = pes[i];
      DATA x = p.input; // input is already summated;
      x = x + p.bias; // add bias,
      p.output=(DATA)1*exp(x)/denom;//
      p.input=0; // reset input.
    }
  }
}

I write to ask whether you have any ideas of a template I could follow as the easiest path toward implementing my own layers with custom encode and recall operations? (akin to the pes and connections in additional parts)? Sorry if this is obvious, I am a very weak C++ programmer, especially when it comes to object-oriented programming (but I am learning :D)!

Thanks for the comments. There are several ways to do this, varying in code style (and runtime performance). After this reply I will have another with further alternative ways. The reply (a) that follows, is based on your code.

(a) So one way to compute softmax is by closely following your approach, i.e. use a customized layer. BTW your code is fine, it is 99.9% of the answer below. And I understand that it is hard to modify code that already exists.

I should mention here that it is easier to do customizing on a ‘generic’ layer. In nnlib2 there is a layer class called ‘pe_layer’, a generic layer. It is what is created when we call add_layer from R with ‘generic’ as layer type. It is described it as “a layer of generic "dumb" pes where most processing will be done in layer code”. I suspect this is what you want to use as base for your new layer.

But if BP functionality needs to be maintained then you have to modify a bp_comput_layer, i.e. a “BP-hidden” layer. This (in its current implementation) also consists of “dumb” PEs, i.e. most processing is done in layer code and not in PE (node) code. The following code does this:

class bp_comput_softmax_layer: public bp::bp_comput_layer
{
public:

    void recall()
    {
        if(no_error())
        {
            DATA denom=0;
            for(int i=0;i<size();i++)
            {
                pe REF p = pes[i];
                DATA x = p.input;                                           // input is already summated;
                x = x + p.bias;                                             // add bias,
                denom=denom+exp(x);
            }

            for(int i=0;i<size();i++)
            {
                pe REF p = pes[i];
                DATA x = p.input;                                           // input is already summated;
                x = x + p.bias;                                             // add bias,
                p.output=(DATA)1*exp(x)/denom;//
                p.input=0;                                                  // reset input.
            }
        }
    }
};

All that was changed from your code is the class definition details. Then a name was added in generate_custom_layer function in file "additional_parts.h" (the name “BP-hidden-softmax” was chosen for the above). This was done as described in the documentation (as well as Step 3 here.

As this code has now been added to the additional_parts.h file found in Github, you can see the code there or use it if you install the package from there. Again, the name for creating a such layer when using the NN R module's add_layermethod is “BP-hidden-softmax”.

Some tests using this:

a <- new( "NN" ) # create a NN object in variable a
a$add_layer( "BP-hidden-softmax", 4 )
a$set_biases_at(1,c(0,0,0,0))

a$input_at(1,c(1,2,0.5,-2))
a$recall_at(1)
a$get_output_from(1)

which returns:

> a$get_output_from(1)
[1] 0.22859235 0.62137844 0.13864827 0.01138094

a$input_at(1,c(7,1,2,1))
a$recall_at(1)
a$get_output_from(1)

which returns:

> a$get_output_from(1)
[1] 0.988439751 0.002450097 0.006660055 0.002450097

Note that this based on the bp_comput_layer, it is meant to be a BP hidden layer. BP output layers are the same except they add their own encoding method. So a “bp-output-softmax” was also created (based on your code there as well). This was also added to “additional_parts.h”, but has not been tested much. Please let me know if you use it and have issues with it.

As you mentioned that you work on real data, if you plan to make the results public a reference to the package would be appreciated (For information on citing this package use the following R command: citation("nnlib2Rcpp") )

Another comment will follow this, with alternative suggestions you (or others) may want to consider.

Continuing the previous reply, other suggestions on how to calculate softmax (or perform such tasks in general):

(b) Use R code. Since you have full control of the NN via R, the simplest (but least ‘elegant’ in my opinion) way would be to use some R code which will perform the desired calculations. For example, get the output of a component, perform softmax in R code, pass the result as input to the next NN layer (if that is what you want to do) and resume NN processing. For example, while recalling from a BP-like NN (similar to that of the example starting on page 17 of the manual (vignette)):

a$input_at( 1, iris_s[ 1 , ] ) # present data at 1st layer
a$recall_at(1)                 # trigger recall at NN’s component 1
a$recall_at(2)                 # trigger recall at NN’s component 2
a$recall_at(3)                 # trigger recall at NN’s component 3

# take 3's output and compute softmax...
x<-a$get_output_at(3)           
denom <- sum(exp(x))
s <- exp(x)/denom

# s is now a softmax vector. Use it, or directly feed it as input to next layer and continue. 
a$input_at(5,s)         # assuming layer @5 has same size = length(x). 
a$recall_at(5)                 # trigger recall at NN’s component 5

etc, This approach allows you to do whatever processing your problem may require (softmax or whatever).

(c) Create a ’aux_control’ nnlib2 component. This is currently not a real option (I explain why below) but I place it here for completeness and suggestions. The idea would be to create (in C++) an ’aux_control’ nnlib2 component that calculates softmax (there already are such components for other processing functions similar to softmax). I am referring to what is mentioned in the “manual” at the middle of page 3:

_“[…] ’aux control’ components are classes of objects that can be used to provide auxiliary, user-defined functionality (e.g. control, display output, user-break, handle other components or sub-components [create, delete, communicate, call methods], perform data operations [filter, mutate, which max, feature expansion] etc.). Being themselves ’component’ objects [like layers and connectionsets], they can be added to NN’s topology, thus be activated during the NN’s data processing sequence via their respective encode() and recall() methods.”.

However this does not currently answer your question simply because the nnlib2Rcpp NN module does not have an ‘add_aux_control’ method (s.a. add_layer etc). This was not included for simplicity and because the interactive nature of R allows you to do any custom processing, as I described in (b).

But based on the above, a new method will be added to NN that will allow any R function to be executed from within the NN during encoding or decoding (recall). This method is ready, and once tested will be added to GitHub, so if interested check the repository again within the next days.

The aforementioned methods (add_R_...) have been added on GitHub repository. Once tested, documented and refined will be released. in v.0.2.0 (and CRAN). Below an example of using them to calculate softmax while "recalling" data:

First lets create a function that computes softmax and a NN (for demonstration, some layers and a connection set are created..)

library(nnlib2Rcpp)
set.seed(222)

softmax <- function(x) exp(x)/sum(exp(x))

a<-new("NN")
a$add_layer("pass-through",4)
a$add_connection_set("wpass-through")
a$add_layer("pass-through",4)
a$add_R_forwarding("on recall","softmax")
a$add_layer("pass-through",4)
a$create_connections_in_sets(0.1,0.2)

Note the add_R_forwarding component. This calls the softmax R function defined above. BTW the warning raised when creating the connections can be ignored, it just informs you that some layer (here those @3 and @5) are not connected by a connection set. But data will pass without issues from one to the other by the R component in between them.

If we outline the NN: a$outline() we get:

Current NN topology:
@ 1 component (id=29) is Layer : pass-through of size 4
@ 2 component (id=30) is Connection Set : wpass-through (Fully Connected) 29-->31 of size 16
@ 3 component (id=31) is Layer : pass-through of size 4
@ 4 component (id=32) is Control Component : simple-R-component (softmax output of above) of size 0
@ 5 component (id=33) is Layer : pass-through of size 4

Data arriving @4 will be softmax-ed. For example:

a$set_input_at(1,c(10,4,6,8))
a$recall_all_fwd()
a$get_output_at(5)

returns something like (you should get the same since we fixed the random weights with set.seed for this example): [1] 0.2327934 0.4192948 0.1678340 0.1800778

Lets do the same step by step to follow the flow of data:

> a$set_input_at(1,c(10,4,6,8))
[1] TRUE
> #recalling data at 1st layer (data passes through)...
> a$get_input_at(1)
[1] 10  4  6  8
> a$recall_at(1)
[1] TRUE
> a$get_output_at(1)
[1] 10  4  6  8
> #recalling data at next connection set (data multiplied by weights)...
> a$get_input_at(2)
 [1] 10  4  6  8 10  4  6  8 10  4  6  8 10  4  6  8
> a$recall_at(2)
[1] TRUE
> #recalling data at next layer (data passes through)...
> a$get_input_at(3)
[1] 4.057892 4.646315 3.730716 3.801129
> a$recall_at(3)
[1] TRUE
> a$get_output_at(3)
[1] 4.057892 4.646315 3.730716 3.801129
> #recalling data at R component (invokes softmax on the above values that are the output @3)
> a$recall_at(4)
[1] TRUE
> a$get_output_at(4)
[1] 0.2327934 0.4192948 0.1678340 0.1800778
> #recalling data at last layer (data passes through)...
> a$get_input_at(5)
[1] 0.2327934 0.4192948 0.1678340 0.1800778
> a$recall_at(5)
[1] TRUE
> a$get_output_at(5)
[1] 0.2327934 0.4192948 0.1678340 0.1800778

Hope this helps.

Thanks again so much!

Sorry to revive a dead thread, but I am working on some front-end changes to make it easier for me to trial new ideas. I am wondering whether you would be open to a pull request that introduces a dependency on the R6 package? (While I tend to agree "light weight is the right weight", I also loathe S3/S4 classes). The main use of R6 will be to wrap the NN S4 class to expose argument names explicitly during function calls and allow users to pass on optional arguments more easily.

If so, I'm happy to share what I'm working on. If not, I'll still try to submit a PR (probably late next week) with some additional-parts only.

Best,

My first thought was: it will be... interesting. C++ classes exposed via Rcpp modules (S4/RC classes?) then wrapped with R6 classes. Or maybe you plan to do something else. In any case, contributions and fresh ideas are always welcomed. So it can be arranged.

If you don’t mind, send me an email so we can discuss some details from there. And as the “master” repository is referenced from CRAN and other package repositories I try to be extra cautious when doing changes on that. But I guess it can be done safely via a fork. I admit I have limited experience with cooperation via GitHub, so lets see how is the best way to do it.

Now... having said that, let me add some (irrelevant?) thoughts here that may affect what you work on (although I cannot see how). If I was to touch the project (which I plan to if/when I have more time) I would first go for some lower lever additions to the nnlib2 C++ library.

_I have (for really long time now) been thinking of adding a variant of current “connection_set” C++ class. This variant would be purely matrix based (the way they are usually implemented). After all, layers are (almost) always fully interconnected with homogeneous connections. This new “connection_set” would have to be of “dumb” connections, with all encode/recall functionality implemented at “connection_set”-level code (on fact. class ‘connection’ would not be used at all). It would be less readable but more optimizable and can be easily made compatible with current layers. And would definitely run faster that the current super-flexible-but-slow “connectionset” implementation.

_Then maybe (I think this is not as crucial) add a similar “layer” class variation implemented as plain data vectors (it is currently a vector -or matrix- of PEs). After all, layers are (almost) always homogeneous as well. The only (significant) thorn would be making this new “layer” compatible with the current “connectionset” class (not the "new" one described in the previous paragraph, that should be much easier).

Such changes would add the option to take the “classic” approach when implementing a NN, i.e. via a bunch of vector/matrix operations (lots of ways to optimize that!).

And as an added bonus, having these would really simplify adding the ability to have their encode/recall functions be defined in R (which is happy playing with vectors and matrices), Doing so would mean lots of data transferred back and forth, therefore not very optimized, but may be a useful option for interactive experimentation with NNs via R.

_I plan to do some of these. I admit I am very tempted to start with the connectionset which seems pretty easy, but first I have to put some time on other pressing things.

VNNikolaidis / nnlib2Rcpp

Guidance on implementing a softmax layer #14