facebookresearch / rlfh-gen-div

This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity
Other
32 stars 3 forks source link

AttributeError: 'LlamaParallelModel' object has no attribute '_prepare_decoder_attention_mask' #2

Open GussailRaat opened 6 months ago

GussailRaat commented 6 months ago

In "/rlvsil/models/llama_parallel.py"

    # embed positions
    if attention_mask is None:
        attention_mask = torch.ones(
            (batch_size, seq_length_with_past), dtype=torch.bool, device=inputs_embeds.device
        )
    attention_mask = self._prepare_decoder_attention_mask(
        attention_mask, (batch_size, seq_length), inputs_embeds, past_key_values_length
    )

There is no function for "self._prepare_decoder_attention_mask(" this is why I am getting the following error "AttributeError: 'LlamaParallelModel' object has no attribute '_prepare_decoder_attention_mask'"

please provide this function as soon as possible. it will be very helpful.

RobertKirk commented 5 months ago

Hi, thanks for your interest in the paper. We were using transformers = 4.31.0 for this project, which is where that function is defined. The llama_parallel model is basically just a copy-paste of the llama model implementation in HF transformers but with model parallelism added. I believe model parallelism is now implemented for llama by default in the latest transformers version, so I expect you could adjust the code to use the HF llama model - i.e. change this line to use LlamaModel from transformers. Although other parts of the model parallelism API may have changed, so likely if you want to reproduce results exactly without any changes you'll need to use that version of transformers.