Saving and Loading Fine Tuned Model

I'm not sure if I'm either not saving or loading the finetuned model correctly. After fine-tuning and running an evaluation script, the accuracy before loading model and after loading model are the exact same. I've tried this with my own data, but for a check, I was attempting to replicate the example here: https://bio-transformers.readthedocs.io/en/latest/tutorial/finetuning.html

Training Script:

import biodatasets
import numpy as np
from biotransformers import BioTransformers
import ray

data = biodatasets.load_dataset("swissProt")
X, y = data.to_npy_arrays(input_names=["sequence"])
X = X[0]

# Train on small sequence
length = np.array(list(map(len, X))) < 200
train_seq = X[length][:10000]
val_seq = X[length][10000:15000]

ray.init()
bio_trans = BioTransformers("esm1_t6_43M_UR50S", num_gpus=4)

bio_trans.finetune(
    train_seq,
    validation_sequences=val_seq,
    lr=1.0e-5,
    warmup_init_lr=1e-7,
    toks_per_batch=2000,
    epochs=20,
    acc_batch_size=256,
    warmup_updates=1024,
    accelerator="ddp",
    checkpoint=None,
    save_last_checkpoint=False,
amp_level=None
)

After running it the logs directory is created with hparams.yaml (is empty), metrics.csv and checkpoints folder with last checkpoint (epoch=19-step=39.ckpt).

Then I run the evaluation script:

import biodatasets
import numpy as np
from biotransformers import BioTransformers
import ray

data = biodatasets.load_dataset("swissProt")
X, y = data.to_npy_arrays(input_names=["sequence"])
X = X[0]

# Train sequence with length less than 200 AA
# Test on sequence that was not used for training.
length = np.array(list(map(len, X))) < 200
train_seq = X[length][15000:20000]

ray.init()
bio_trans = BioTransformers("esm1_t6_43M_UR50S", num_gpus=4)
acc_before = bio_trans.compute_accuracy(train_seq, batch_size=32)
print(f"Accuracy before finetuning : {acc_before}")

bio_trans.load_model("logs/finetune_masked/version_0/checkpoints/epoch=19-step=39.ckpt")
acc_after = bio_trans.compute_accuracy(train_seq, batch_size=32)
print(f"Accuracy after finetuning : {acc_after}")

Which outputs:

Accuracy before finetuning : 0.3469025194644928
Accuracy after finetuning : 0.3469025194644928

Am I saving or loading incorrectly?

DeepChainBio / bio-transformers

Saving and Loading Fine Tuned Model #34