Closed nzsimonc closed 2 weeks ago
I have also run 'pytest test_megamolbart_triton.py' pointing that to my 'fine tuned' .nemo file and it gives me the same.
======================================================================= short test summary info ======================================================================= ERROR test_megamolbart_triton.py::test_seq_to_embedding_triton - RuntimeError: Error(s) in loading state_dict for MegaMolBARTModel: ERROR test_megamolbart_triton.py::test_seq_to_hidden_triton - RuntimeError: Error(s) in loading state_dict for MegaMolBARTModel: ERROR test_megamolbart_triton.py::test_hidden_to_seqs_triton - RuntimeError: Error(s) in loading state_dict for MegaMolBARTModel: ERROR test_megamolbart_triton.py::test_samplings_triton - RuntimeError: Error(s) in loading state_dict for MegaMolBARTModel: ERROR test_megamolbart_triton.py::test_seq_to_embedding_direct - RuntimeError: Error(s) in loading state_dict for MegaMolBARTModel: ERROR test_megamolbart_triton.py::test_seq_to_hidden_direct - RuntimeError: Error(s) in loading state_dict for MegaMolBARTModel: ERROR test_megamolbart_triton.py::test_hidden_to_seqs_direct - RuntimeError: Error(s) in loading state_dict for MegaMolBARTModel: ERROR test_megamolbart_triton.py::test_samplings_direct - RuntimeError: Error(s) in loading state_dict for MegaMolBARTModel: ========================================================================= 8 errors in 27.01s
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.
Describe the bug I can successfully run my inference code on the default megamolbart.nemo, but as soon as I run any kind of fine tuning on it then I get the error RuntimeError: Error(s) in loading state_dict for MegaMolBARTModel. I've attached code, config and output for both finetune and infer stages.
My desired outcome is to use the finetuning process to add in some of my own data, then call the inference code to get the SMILES string and the prediction.
Steps/Code to reproduce bug 1. FineTune - code, config and output Actually I get the same issue if I take trainer.fit(model) out of the code as well. My Fine Tune Code (company1.6_finetune_donothing.py.txt) just loads the default megamolbart.nemo using finetune_config.yaml.txt (doesn't alter anything if restore_from_path is filled in or not). The Result (company1.6_finetune_donothing_RESULT.txt) seems fine. company1.6_finetune_donothing.py.txt finetune_config.yaml.txt company1.6_finetune_donothing_RESULT.txt
2. Infer - code, config and output I then run my infer code (company1.6_infer.py.txt) with the default infer.yaml file (infer.yaml.txt) with just the restore_from_path pointing to the file created in Step 1. NOTE: as can be seen in the 'Expected behavior' section, if I use the default megamolbart.nemo file then it all works as expected. company1.6_infer.py.txt infer.yaml.txt company1.6_infer_RESULT.txt
TLDR; This is my error: [NeMo I 2024-07-11 01:57:35 regex_tokenizer:254] Loading regex from file = /workspace/bionemo/tokenizers/molecule/megamolbart/vocab/megamolbart.model [NeMo I 2024-07-11 01:57:35 megatron_base_model:315] Padded vocab_size: 640, original vocab_size: 523, dummy tokens: 117. [NeMo W 2024-07-11 01:57:35 megatron_lm_encoder_decoder_model:240] Could not find encoder or decoder in config. This is probably because of restoring an old checkpoint. Copying shared model configs to encoder and decoder configs. [NeMo W 2024-07-11 01:57:35 megatron_lm_encoder_decoder_model:206] bias_gelu_fusion is deprecated. Please use bias_activation_fusion instead. [NeMo W 2024-07-11 01:57:35 megatron_lm_encoder_decoder_model:206] bias_gelu_fusion is deprecated. Please use bias_activation_fusion instead. Traceback (most recent call last): File "/workspace/bionemo/examples/molecule/megamolbart/company1.6_infer.py", line 59, in
inferer = load_model_for_inference(cfg, interactive=True)
File "/workspace/bionemo/bionemo/triton/utils.py", line 238, in load_model_for_inference
model = infer_class(cfg, interactive=interactive, **kwargs)
File "/workspace/bionemo/bionemo/model/molecule/infer.py", line 40, in init
super().init(
File "/workspace/bionemo/bionemo/model/core/infer.py", line 468, in init
super().init(
File "/workspace/bionemo/bionemo/model/core/infer.py", line 146, in init
self.model = self.load_model(cfg, model=model, restore_path=restore_path, strict=strict_restore_from_path)
File "/workspace/bionemo/bionemo/model/core/infer.py", line 206, in load_model
model = restore_model(
File "/workspace/bionemo/bionemo/model/utils.py", line 363, in restore_model
model = model_cls.restore_from(
File "/usr/local/lib/python3.10/dist-packages/nemo/collections/nlp/models/nlp_model.py", line 465, in restore_from
return super().restore_from(
File "/usr/local/lib/python3.10/dist-packages/nemo/core/classes/modelPT.py", line 442, in restore_from
instance = cls._save_restore_connector.restore_from(
File "/usr/local/lib/python3.10/dist-packages/nemo/collections/nlp/parts/nlp_overrides.py", line 751, in restore_from
super().load_instance_with_state_dict(instance, state_dict, strict)
File "/usr/local/lib/python3.10/dist-packages/nemo/core/connectors/save_restore_connector.py", line 203, in load_instance_with_state_dict
instance.load_state_dict(state_dict, strict=strict)
File "/usr/local/lib/python3.10/dist-packages/nemo/collections/nlp/models/nlp_model.py", line 447, in load_state_dict
results = super(NLPModel, self).load_state_dict(state_dict, strict=strict)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for MegaMolBARTModel:
Missing key(s) in state_dict: "enc_dec_model.encoder_embedding.word_embeddings.weight", "enc_dec_model.encoder_embedding.position_embeddings.weight", "enc_dec_model.decoder_embedding.word_embeddings.weight", "enc_dec_model.decoder_embedding.position_embeddings.weight", "enc_dec_model.enc_dec_model.encoder.model.layers.0.input_layernorm.weight", "enc_dec_model.enc_dec_model.encoder.model.layers.0.input_layernorm.bias", "enc_dec_model.enc_dec_model.encoder.model.layers.0.self_attention.query_key_value.weight", "enc_dec_model.enc_dec_model.encoder.model.layers.0.self_attention.query_key_value.bias", "enc_dec_model.enc_dec_model.encoder.model.layers.0.self_attention.dense.weight",
Expected behavior
Run the Inference code using the default downloadable megamolbart.nemo and it works fine as can be seen here: company1.6_infer_RESULT_DEFAULT_NEMO.txt aka we can get simple things like Reconstructed SMILES: from the system.
Environment overview (please complete the following information)
docker pull
&docker run
commands used: docker run -it --rm --gpus all -v /home/azureuser/company_info:/workspace/bionemo/company_info nvcr.io/nvidia/clara/bionemo-framework:1.3 bash also tried docker run -it --rm --gpus all -v /home/azureuser/company_info:/workspace/bionemo/company_info nvcr.io/nvidia/clara/bionemo-framework:1.6 bashEnvironment details NVIDIA docker image is used
Additional context Azure T4 GPU