david-thrower / cerebros-core-algorithm-alpha

The Cerebros package is an ultra-precise Neural Architecture Search (NAS) / AutoML that is intended to much more closely mimic biological neurons than conventional neural network architecture strategies.
Other
27 stars 4 forks source link

Add option for custom activation (e.g. activation with hyperparams). #15

Open david-thrower opened 1 year ago

david-thrower commented 1 year ago

Genomic research often will Z scale target phenotypes when they are doing regression based GWAS experiments. This makes it easy to quantify the RMSE / MAE in terms of standard deviations from the mean. This gives a metric / loss that is relative to the distribution of the data as well.

This may also provide some protection against exploding gradients. (Calculating derivatives off 1-0 range values may be less likely to give astronomically high numbers).

If we can extend this (Likely split off as a separate issue), to train and predict some regressors with the following strategy:

  1. Cerebros Z-scales the target
  2. Cerebros trains and compute loss on z-scaled targets
  3. Then produce validation metrics by internally producing z-scaled predictions, but then returning reverse - z - scaled predictions, as the actual returned predictions and what the final metrics are computed by,
  4. This may be more robust. Same with 1-0 scaling. Given the volume of data Cerebros was designed for, the slight increase in accuracy from z-scaling may be worth the marginal computational expense over 1-0 scaling. Iseally we should make it try all 3 strategies: Train on raw data, train on Z-scaled data [predict by reverse z-scaling], train on 1 - 0 scaled data [predict by reverse 1-0 scaling].

What is proposed is to add a parameter that allows a hyperparameter to be passed to the output Dense layer's activation and also allows both string names and functions / lambdas to be passed for the final_activation argument. with this, we may need a way to parse the function from a string, as it is good to make the final activation tunable and tuners like Katib, optuna, ... typically can't take a list of functions as the options for parameters. Basically we would need (conditionally executed when final_activation is not a string in the list of parameter-less Keras activations). The most secure solution may look like:


if final_activation == 'softsign':
    final_activation = tf.keras.activations.softsign()

# 

... same code elsewhere ...

output = Dense(...):

# Handling of regression against Z-scaled and 1-0 scaled targets (isru or softsign activation * a tunable scaling factor)
# Handling of regression of 1-0 scaled targets: Use sigmoid activation and use a scaling factor or or near 1 
if final_activation in  ['softsign', 'isru' .. ] or final_activation == 'sigmoid' and problem == 'regression':  

if final_activation not in [...] # list of parameterless Keras actication functions ...:
    output = tf.keras.layers.Lambda( x: x * c ) # c is a tunable scaling hyperparameter ...

model = tf,keras.Model(inputs = inp, outputs = output)

...

The hard part may be adding the ability to return predictions and metrics as reverse - scaled data, while computing the loss on z-scaled | 1-0 scaled data. This may require subclassing Keras optimizers, models, or losses ...