gwarmstrong commented 4 years ago

Songbird random seeds

Current behavior:

When I run:

$ songbird multinomial \
    --input-biom data/redsea/redsea.biom \
    --metadata-file data/redsea/redsea_metadata.txt \
    --formula "1" \
    --epochs 1000 \
    --differential-prior 0.5 \
    --summary-interval 1 \
    --summary-dir logdir/model1
$ songbird multinomial \
    --input-biom data/redsea/redsea.biom \
    --metadata-file data/redsea/redsea_metadata.txt \
    --formula "1" \
    --epochs 1000 \
    --differential-prior 0.5 \
    --summary-interval 1 \
    --summary-dir logdir/model2

I get something like the following, where the cv-error converges to different values and the training error is updated differently:

Further exploration:

So I provided an option to fix the seed for making train/test splits:

$ songbird multinomial \
    --input-biom data/redsea/redsea.biom \
    --metadata-file data/redsea/redsea_metadata.txt \
    --formula "1" \
    --epochs 1000 \
    --differential-prior 0.5 \
    --summary-interval 1 \
    --summary-dir logdir/model3 \
    --seed 42
$ songbird multinomial \
    --input-biom data/redsea/redsea.biom \
    --metadata-file data/redsea/redsea_metadata.txt \
    --formula "1" \
    --epochs 1000 \
    --differential-prior 0.5 \
    --summary-interval 1 \
    --summary-dir logdir/model4 \
    --seed 42

Screen Shot 2019-12-13 at 10 47 00 AM

And now the models converge to approximately the same value. However, as there a still variations in their training updates. So I provided a way to fix the seed that tensorflow uses as well.

$ songbird multinomial \
    --input-biom data/redsea/redsea.biom \
    --metadata-file data/redsea/redsea_metadata.txt \
    --formula "1" \
    --epochs 1000 \
    --differential-prior 0.5 \
    --summary-interval 1 \
    --summary-dir logdir/model5 \
    --split-seed 42 \
    --tf-seed 22
$ songbird multinomial \
    --input-biom data/redsea/redsea.biom \
    --metadata-file data/redsea/redsea_metadata.txt \
    --formula "1" \
    --epochs 1000 \
    --differential-prior 0.5 \
    --summary-interval 1 \
    --summary-dir logdir/model6 \
    --split-seed 42 \
    --tf-seed 22

And now it is identical between runs: Screen Shot 2019-12-13 at 11 12 50 AM

Proposed behavior:

When the above commands are run, I should at least get the same CV score, but furthermore, it should be possible to fix the entire training process.

In order to fix this I propose the following solution:

For QIIME2 plugin:
- Have a fixed random seed for both tensorflow and train/test split
- this ensures that all results that come out of Q2 for the same data are directly comparable, and the same optimization results are achieved every time
- additionally removes the need for a user to understand how to appropriately set their on seeds
For the standalone interface:
- Have a random seed option for the train/test split and tensorflow, that are independent of each other
- Set a default seed for train/test split
- again ensures that results are directly comparable by default
- Have the option to set a seed for tensorflow, but is not set by default
- This one is less of a big deal since the loss function is convex, but it is nice to be able to set if you want to recreate the exact same process

Summary of changes:

add --split-seed and --tf-seed option to the songbird CLI (and update parameter info accordingly)
add a unit test to scripts.test_songbird_cli that verifies that different combinations of these flags still run on the CLI
Set both random seeds to 0 in the Q2 plugin
add a test to the Q2 plugin that verifies subsequent calls on the same inputs have the exact same output

gwarmstrong commented 4 years ago

Hi @gwarmstrong , this is awesome.

What do you think about only having 1 seed passed in? I'm assuming that two seeds are defined due to numpy / TF differences. But you can imagine the following should work
def multinomial(..., seed):
    np.random.seed(seed)
    tf.set_random_seed(seed)
It'll be more ideal to only specify 1 seed to try to minimize the number of available options to the users.

Sure, I thought it might be nice to be able to adjust them independently. This could be helpful for, e.g., verifying that you get a somewhat consistent error minimum across different a random initialization, but with the same train/test split.

Given that the loss here is convex with respect to the parameters, my idea is probably less necessary and we can change to a single seed.

gwarmstrong commented 4 years ago

@mortonjt this should reflect the changes you suggested now

gwarmstrong commented 4 years ago

@mortonjt just a polite bump on this PR! Also noting that this issue has subsequently come up with multiple people within Knight Lab

gwarmstrong commented 4 years ago

If you run this command on a non-zero seed and post your tensorboard results on the example dataset (i.e. redsea) I'll can double check on my machine to make sure those results are reproducible.

Does the third tensorboard output in the original description work? Command and tensorboard screenshot is included. Or do you need additional information?

mortonjt commented 4 years ago

Yes, the output in the original tensorboard would work - but the interface changed right?

gwarmstrong commented 4 years ago

Ah right, that is when the seeds could be set separately. Sure, I will re-run and upload.

gwarmstrong commented 4 years ago

Try this on for size

$ songbird multinomial \
    --input-biom data/redsea/redsea.biom \
    --metadata-file data/redsea/redsea_metadata.txt \
    --formula "1" \
    --epochs 1000 \
    --differential-prior 0.5 \
    --summary-interval 1 \
    --summary-dir logdir/new_model \
    --random-seed 42

Screen Shot 2020-01-07 at 5 32 18 PM

mortonjt commented 4 years ago

I was able to run and reproduce the results

Once the last comment regarding the qiime2 seed is done and verified to work, I can merge this in.

lisa55asil commented 4 years ago

I think these changes are great and strongly agree that we should have a default seed set for the q2 plugin. As you mentioned if people want change or remove the seed they can use the standalone version, but it is critical that q2 produces reproducible results.

I'm going to update the readme to make it more clear: 1) how and why to use the null/baseline model 2) when to use the --p-training-column

Thanks for the awesome updates George!!

mortonjt commented 4 years ago

Thanks @lisa55asil . I'm going to merge this in. @gwarmstrong thanks!

biocore / songbird

ENH: Set random seeds #101

Songbird random seeds

Current behavior:

Further exploration:

Proposed behavior:

Summary of changes: