Open rahul-sarvam opened 1 week ago
I have compared a bunch of things between the 2 models and looks like there is a large difference between the logits of the 2 models.
nemo_model = MegatronGPTModel.restore_from(
nemo_path,
trainer=dummy_trainer,
override_config_path=model_config,
map_location=map_location
)
# Load HuggingFace model
hf_model = AutoModelForCausalLM.from_pretrained(
hf_path,
local_files_only=True,
torch_dtype=torch.bfloat16 # nemo_model.dtype
)
# Load tokenizer
tokenizer = LlamaTokenizer.from_pretrained(tokenizer_path, legacy=False)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.pad_token_id = tokenizer.eos_token_id
# Move models to device
nemo_model = nemo_model.to(device)
hf_model = hf_model.to(device)
# Set both models to eval mode
nemo_model.eval()
hf_model.eval()
# Create random input ids
input_ids = torch.randint(
100, 1000,
(test_batch_size, test_seq_length),
device=device
)
attention_mask = torch.ones_like(input_ids)
with torch.no_grad():
# NeMo forward pass
nemo_output = nemo_model(
tokens=input_ids,
text_position_ids=torch.arange(test_seq_length, device=device),
attention_mask=attention_mask,
labels=None
)
# HF forward pass
hf_output = hf_model(
input_ids=input_ids,
attention_mask=attention_mask,
output_hidden_states=True,
return_dict=True
).logits
# Compare logits
logits_match = torch.allclose(
nemo_output,
hf_output,
rtol=rtol,
atol=atol
)
metrics['logits_max_diff'] = float(
torch.max(torch.abs(nemo_output - hf_output)).cpu()
)
Output:
Conversion test results:
Logits match: False (max diff: 4.91e+00)
Parameters match: True (max diff: 0.00e+00)
Generation match: 0.0
Sample generation comparison:
Input text: '<s>[INST] Hello [/INST]\n'
NeMo output: "<s>[INST] Hello [/INST]\n Hello. It's nice to meet you. Is there something I can help you with or"
HF output: '<s> [INST] Hello [/INST]\n Hello. ನಿಮ್ಮನ್ನ ಭೇಟಿ ಮಾಡಿ ಸಂತೋಷ ಆಯ್ತು. ನಿಮಗೆ ಏನ'
Number of parameters match: 1.0 (Nemo: 2525087744, HF: 2525087744)
❌ Conversion test failed!
I am not able to pinpoint why this is happening. Any pointers will be helpful.
Describe the bug
I have trained a llama-like model with nemo using the below model config:
The model works well when I run inference using the nemo checkpoint (script). But the converted checkpoint (script) drastically drops in performance. Any ideas why this might be happening? My only hunch is that
apply_query_key_layer_scaling=True
in nemo, which might not be the case in HF.Environment details https://docs.nvidia.com/nemo-framework/user-guide/latest/softwarecomponentversions.html#nemo-framework-24-05