Fixes a serious bug in the model TransformerEncoderLayer by adding batch_first=True to the encoder (by default it expects the second dimension to be batch??). Also uses "gelu" instead of relu activations for the transformer.
Changes the ENABLE_WANDB module var (which turns on all WANDB data uploading) to be set by the ENABLE_WANDB environment variable. This means w-and-b integration is off by default unless the env var is set. I think this might be helpful for debugging scenarios where we dont want to create a bunch of extra wandb runs.
Fixes a serious bug in the model TransformerEncoderLayer by adding batch_first=True to the encoder (by default it expects the second dimension to be batch??). Also uses "gelu" instead of relu activations for the transformer.
Changes the
ENABLE_WANDB
module var (which turns on all WANDB data uploading) to be set by the ENABLE_WANDB environment variable. This means w-and-b integration is off by default unless the env var is set. I think this might be helpful for debugging scenarios where we dont want to create a bunch of extra wandb runs.