Closed walid0925 closed 2 years ago
i think i've figured this out - the private method esm.pretrained._load_model_and_alphabet_core_v2(model_data)
doesn't update the model's state_dict
, so this is doing inference through a randomly initialized model. I can make a quick PR to fix this
thanks so much for your work in maintaining ESM!
Bug description I recently started to play around with these models and have noticed non-determinism when using CPU offloading as shown in the
examples/esm2_infer_fairscale_fsdp_cpu_offloading.py
script; non-determinism in itself is perhaps expected but the magnitude that I'm seeing is not. Please let me know if I've missed anything!Reproduction steps This code is almost directly copied from the examples script, with the modification of using only one example protein sequence and printing a mean representation. I've also included a smaller model here for speed, though I've noticed the same across other models such as
esm2_t36_3B_UR50D
.Repeated calls result in very different outputs. Included here are the outputs from three consecutive runs
As you can see, the outputs are very different (different magnitudes, even different signs)
Expected behavior I would expect consecutive runs to have approximately similar outputs, even if not exactly the same
Additional context Device: AWS p3.2xlarge (Tesla V100)