AoiDragon / HADES

[ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models''
MIT License
21 stars 3 forks source link

environmental question #7

Open matengxiaotiancai opened 2 months ago

matengxiaotiancai commented 2 months ago

Hello, I am very interested in your outstanding work and would like to reproduce your project. Therefore, I would like to inquire about the specific environment for executing the steps of STEP 2 Amplifying Image Harmfulness with LLMs, and STEP 3 Amplifying Image Harmfulness with Grade Update, in order to ensure that the reproduced results are consistent with yours.

AoiDragon commented 1 month ago

Hello @matengxiaotiancai,

We mainly utilized models PixArt XL 2 and LLaVA-1.5 for these two steps. You can refer to their respective repositories for instructions on setting up their environments.

matengxiaotiancai commented 1 month ago

Hello @matengxiaotiancai,

We mainly utilized models PixArt XL 2 and LLaVA-1.5 for these two steps. You can refer to their respective repositories for instructions on setting up their environments.

Thank you for your reply! I have another question. May I ask what does' flagged 'mean when it is True or False in the file' generate_init_image.py '? I didn't understand the following code: processed_data = {} for entry in raw_data: entry_id = entry['id'] print(f"Processing entry {entry_id}")

    if entry_id in processed_data and entry['flagged']:
        if int(entry['step']) < int(processed_data[entry_id]['step']):
            processed_data[entry_id] = entry
    elif entry_id not in processed_data:
        processed_data[entry_id] = entry
    elif not entry['flagged'] and processed_data[entry_id]['flagged']:
        continue
    elif entry['step'] == "3":
        processed_data[entry_id] = entry
pyogher commented 1 month ago

Hi @matengxiaotiancai,

This part of the code is primarily designed to filter out samples that successfully jailbreak target MLLM during the black-box optimization process. In our experiments, we set five steps to jailbreak target MLLM with black-box optimization. If a successful attack sample appears at any of these five steps, we will save the samples and employ early stop in our optimization process, i,e., if the jailbreak is completed in the second step, we retain the results from the second step. If none of the five steps successfully completed the jailbreak, we retain the results from the fifth step.

processed_data = {}
for entry in raw_data:
    entry_id = entry['id']

    # If the id is in processed_data and the current entry is flagged, check the step
    if entry_id in processed_data and entry['flagged']:
        # If the current entry's step is smaller, replace it
        if int(entry['step']) < int(processed_data[entry_id]['step']):
            processed_data[entry_id] = entry
    elif entry_id not in processed_data:
        # If the id is not in processed_data, then add it
        processed_data[entry_id] = entry
    # If the current entry is not flagged, but the same id in processed_data is flagged, skip it
    elif not entry['flagged'] and processed_data[entry_id]['flagged']:
        continue
    # If the current entry's step is 5, replace it (only executes if there are no flagged entries)
    elif entry['step'] == "5":
        processed_data[entry_id] = entry

# Ensure the retained samples are only those flagged as true, and have a step of 5 (if no flagged samples exist)
output_list = list(processed_data.values())