Should I keep decreasing the learning rate?

Hi,

I ran CellBender on my dataset using these params:

cellbender remove-background \
                 --cuda \
                 --input ${sample}/raw_feature_bc_matrix.h5 \
                 --output ${sample}/cellbender_output.h5 \
                 --expected-cells 10000 \
                 --total-droplets-included 30000 \
                 --exclude-feature-types "Antibody Capture" \
                 --fpr 0.01 \
                 --epochs 150

The html report showed this ELBO plot:

cb_elbo_1

And it issued this warning:

Automated assessment --------

WARNING: The training ELBO deviates quite a bit from the max value during the second half of training.

We typically expect to see the training ELBO increase almost monotonically. This curve seems to have a concerted period of motion in the wrong direction near epoch 56. If this is early in training, this is probably okay.

We hope to see the test ELBO follow the training ELBO, increasing almost monotonically (though there will be deviations, and that is expected). There may be a large gap, and that is okay. However, this curve ends with a low test ELBO compared to the max test ELBO value during training. The output could be suboptimal.

Summary:

This is unusual behavior, and a reduced --learning-rate is indicated. Re-run with half the current learning rate and compare the results.

I followed the suggestion and halved the learning rate, re-running cellbender with --learning-rate 0.00005. The resulting ELBO plot looked like this:

cb_elbo_2

Automated assessment --------

The training ELBO deviates quite a bit from the max value at the last epoch.

We typically expect to see the training ELBO increase almost monotonically. This curve seems to have a concerted period of motion in the wrong direction near epoch 76. If this is early in training, this is probably okay.

We hope to see the test ELBO follow the training ELBO, increasing almost monotonically (though there will be deviations, and that is expected). There may be a large gap, and that is okay. However, this curve ends with a low test ELBO compared to the max test ELBO value during training. The output could be suboptimal.

Summary:

This is slightly unusual behavior, and a reduced --learning-rate might be indicated. Consider re-running with half the current learning rate to compare the results.

Should I keep halving the learning rate? Or I'd be better using the default learning rate but using more epochs?

broadinstitute / CellBender

Should I keep decreasing the learning rate? #360