DataCTE / prompt_injection

Apache License 2.0
70 stars 15 forks source link

Made a version to influence SVD (please help me test) #3

Closed 311-code closed 1 week ago

311-code commented 1 month ago

Disclaimer: This does not work yet.

Posting this here because this repo helped me a ton (and the other fork).

I actually got clip conditioning working to some extent for injecting svd a little with this repo's ideas and learning about clip text embedding and injecting weight and bias layers with images and text. I updated the svd_img2vid_conditioning (will create repo)

Here I've included the various SVD model probing results I got back, where I put comment dummy data to scan for hidden inputs and outputs in the svd model and just exploring this.

Updated the prompt_injection.py with Attn2 Prompt Injection Node. This is all wrong after review, but maybe can use as a framework. This week I'm using the pipeline_stable_video_diffusion.py for how to actually do the embeddings right and will update this in the future.

import comfy.model_patcher
import comfy.samplers
import torch
import torch.nn.functional as F

def build_patch(patchedBlocks, weight=1.0, sigma_start=0.0, sigma_end=1.0):
    def prompt_injection_patch(n, context_attn1: torch.Tensor, value_attn1, extra_options):
        (block, block_index) = extra_options.get('block', (None,None))
        sigma = extra_options["sigmas"].detach().cpu()[0].item() if 'sigmas' in extra_options else 999999999.9

        batch_prompt = n.shape[0] // len(extra_options["cond_or_uncond"])

        if sigma <= sigma_start and sigma >= sigma_end:
            if (block and f'{block}:{block_index}' in patchedBlocks and patchedBlocks[f'{block}:{block_index}']):
                if context_attn1.dim() == 3:
                    c = context_attn1[0].unsqueeze(0)
                else:
                    c = context_attn1[0][0].unsqueeze(0)
                b = patchedBlocks[f'{block}:{block_index}'][0][0].repeat(c.shape[0], 1, 1).to(context_attn1.device)
                out = torch.stack((c, b)).to(dtype=context_attn1.dtype) * weight
                out = out.repeat(1, batch_prompt, 1, 1) * weight

                return n, out, out 

        return n, context_attn1, value_attn1
    return prompt_injection_patch

def build_svd_patch(patchedBlocks, weight=1.0, sigma_start=0.0, sigma_end=1.0):
    def prompt_injection_patch(n, context_attn1: torch.Tensor, value_attn1, extra_options):
        (block, block_index) = extra_options.get('block', (None, None))
        sigma = extra_options["sigmas"].detach().cpu()[0].item() if 'sigmas' in extra_options else 999999999.9

        if sigma_start <= sigma <= sigma_end:
            if block and f'{block}:{block_index}' in patchedBlocks and patchedBlocks[f'{block}:{block_index}']:
                if context_attn1.dim() == 3:
                    c = context_attn1[0].unsqueeze(0)
                else:
                    c = context_attn1[0][0].unsqueeze(0)
                b = patchedBlocks[f'{block}:{block_index}'][0][0].repeat(c.shape[0], 1, 1).to(context_attn1.device)

                # Interpolate to match the sizes
                if c.size() != b.size():
                    b = F.interpolate(b.unsqueeze(0), size=c.size()[1:], mode='nearest').squeeze(0)

                out = torch.cat((c, b), dim=-1).to(dtype=context_attn1.dtype) * weight
                return n, out  # Ensure exactly two values are returned for SVD
        return n, context_attn1, value_attn1  # Ensure exactly three values are returned

    return prompt_injection_patch

class SVDPromptInjection:
    @classmethod
    def INPUT_TYPES(s):
        return {
            "required": {"model": ("MODEL",)},
            "optional": {
                "all": ("CONDITIONING",),
                "time_embed": ("CONDITIONING",),
                "label_emb": ("CONDITIONING",),
                "input_blocks_0": ("CONDITIONING",),
                "input_blocks_1": ("CONDITIONING",),
                "input_blocks_2": ("CONDITIONING",),
                "input_blocks_3": ("CONDITIONING",),
                "input_blocks_4": ("CONDITIONING",),
                "input_blocks_5": ("CONDITIONING",),
                "input_blocks_6": ("CONDITIONING",),
                "input_blocks_7": ("CONDITIONING",),
                "input_blocks_8": ("CONDITIONING",),
                "middle_block_0": ("CONDITIONING",),
                "middle_block_1": ("CONDITIONING",),
                "middle_block_2": ("CONDITIONING",),
                "output_blocks_0": ("CONDITIONING",),
                "output_blocks_1": ("CONDITIONING",),
                "output_blocks_2": ("CONDITIONING",),
                "output_blocks_3": ("CONDITIONING",),
                "output_blocks_4": ("CONDITIONING",),
                "output_blocks_5": ("CONDITIONING",),
                "output_blocks_6": ("CONDITIONING",),
                "output_blocks_7": ("CONDITIONING",),
                "output_blocks_8": ("CONDITIONING",),
                "weight": ("FLOAT", {"default": 1.0, "min": -2.0, "max": 5.0, "step": 0.05}),
                "start_at": ("FLOAT", {"default": 0.0, "min": 0.0, "max": 1.0, "step": 0.001}),
                "end_at": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.001}),
            }
        }

    RETURN_TYPES = ("MODEL",)
    FUNCTION = "patch"
    CATEGORY = "advanced/model"

    def patch(self, model: comfy.model_patcher.ModelPatcher, all=None, time_embed=None, label_emb=None, input_blocks_0=None, input_blocks_1=None, input_blocks_2=None, input_blocks_3=None, input_blocks_4=None, input_blocks_5=None, input_blocks_6=None, input_blocks_7=None, input_blocks_8=None, middle_block_0=None, middle_block_1=None, middle_block_2=None, output_blocks_0=None, output_blocks_1=None, output_blocks_2=None, output_blocks_3=None, output_blocks_4=None, output_blocks_5=None, output_blocks_6=None, output_blocks_7=None, output_blocks_8=None, weight=1.0, start_at=0.0, end_at=1.0):
        if not any((all, time_embed, label_emb, input_blocks_0, input_blocks_1, input_blocks_2, input_blocks_3, input_blocks_4, input_blocks_5, input_blocks_6, input_blocks_7, input_blocks_8, middle_block_0, middle_block_1, middle_block_2, output_blocks_0, output_blocks_1, output_blocks_2, output_blocks_3, output_blocks_4, output_blocks_5, output_blocks_6, output_blocks_7, output_blocks_8)):
            return (model,)

        m = model.clone()
        sigma_start = m.get_model_object("model_sampling").percent_to_sigma(start_at)
        sigma_end = m.get_model_object("model_sampling").percent_to_sigma(end_at)

        patchedBlocks = {}
        blocks = {
            'time_embed': [0],
            'label_emb': [0],
            'input_blocks': list(range(9)),
            'middle_block': list(range(3)),
            'output_blocks': list(range(9))
        }

        for block in blocks:
            for index in blocks[block]:
                block_name = f"{block}_{index}"
                value = locals().get(block_name, None)
                if value is None:
                    value = all
                if value is not None:
                    patchedBlocks[f"{block}:{index}"] = value

        m.set_model_attn2_patch(build_svd_patch(patchedBlocks, weight=weight, sigma_start=sigma_start, sigma_end=sigma_end))

        return (m,)

class PromptInjection:
    @classmethod
    def INPUT_TYPES(s):
        return {
            "required": {
                "model": ("MODEL",),
            },
            "optional": {
                "all":  ("CONDITIONING",),
                "input_4":  ("CONDITIONING",),
                "input_5":  ("CONDITIONING",),
                "input_7":  ("CONDITIONING",),
                "input_8":  ("CONDITIONING",),
                "middle_0": ("CONDITIONING",),
                "output_0": ("CONDITIONING",),
                "output_1": ("CONDITIONING",),
                "output_2": ("CONDITIONING",),
                "output_3": ("CONDITIONING",),
                "output_4": ("CONDITIONING",),
                "output_5": ("CONDITIONING",),
                "weight": ("FLOAT", {"default": 1.0, "min": -2.0, "max": 5.0, "step": 0.05}),
                "start_at": ("FLOAT", {"default": 0.0, "min": 0.0, "max": 1.0, "step": 0.001}),
                "end_at": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.001}),
            }
        }

    RETURN_TYPES = ("MODEL",)
    FUNCTION = "patch"

    CATEGORY = "advanced/model"

    def patch(self, model: comfy.model_patcher.ModelPatcher, all=None, input_4=None, input_5=None, input_7=None, input_8=None, middle_0=None, output_0=None, output_1=None, output_2=None, output_3=None, output_4=None, output_5=None, weight=1.0, start_at=0.0, end_at=1.0):
        if not any((all, input_4, input_5, input_7, input_8, middle_0, output_0, output_1, output_2, output_3, output_4, output_5)):
            return (model,)

        m = model.clone()
        sigma_start = m.get_model_object("model_sampling").percent_to_sigma(start_at)
        sigma_end = m.get_model_object("model_sampling").percent_to_sigma(end_at)

        patchedBlocks = {}
        blocks = {'input': [4, 5, 7, 8], 'middle': [0], 'output': [0, 1, 2, 3, 4, 5]}

        for block in blocks:
            for index in blocks[block]:
                value = locals()[f"{block}_{index}"] if locals()[f"{block}_{index}"] is not None else all
                if value is not None:
                    patchedBlocks[f"{block}:{index}"] = value

        m.set_model_attn2_patch(build_patch(patchedBlocks, weight=weight, sigma_start=sigma_start, sigma_end=sigma_end))

        return (m,)

class SimplePromptInjection:
    @classmethod
    def INPUT_TYPES(s):
        return {
            "required": {
                "model": ("MODEL",),
            },
            "optional": {
                "block": (["input:4", "input:5", "input:7", "input:8", "middle:0", "output:0", "output:1", "output:2", "output:3", "output:4", "output:5"],),
                "conditioning": ("CONDITIONING",),
                "weight": ("FLOAT", {"default": 1.0, "min": -2.0, "max": 5.0, "step": 0.05}),
                "start_at": ("FLOAT", {"default": 0.0, "min": 0.0, "max": 1.0, "step": 0.001}),
                "end_at": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.001}),
            }
        }

    RETURN_TYPES = ("MODEL",)
    FUNCTION = "patch"

    CATEGORY = "advanced/model"

    def patch(self, model: comfy.model_patcher.ModelPatcher, block, conditioning=None, weight=1.0, start_at=0.0, end_at=1.0):
        if conditioning is None:
            return (model,)

        m = model.clone()
        sigma_start = m.get_model_object("model_sampling").percent_to_sigma(start_at)
        sigma_end = m.get_model_object("model_sampling").percent_to_sigma(end_at)

        m.set_model_attn2_patch(build_patch({f"{block}": conditioning}, weight=weight, sigma_start=sigma_start, sigma_end=sigma_end))

        return (m,)

class SimplePromptInjection:
    @classmethod
    def INPUT_TYPES(s):
        return {
            "required": {
                "model": ("MODEL",),
            },
            "optional": {
                "block": (["input:4", "input:5", "input:7", "input:8", "middle:0", "output:0", "output:1", "output:2", "output:3", "output:4", "output:5"],),
                "conditioning": ("CONDITIONING",),
                "weight": ("FLOAT", {"default": 1.0, "min": -2.0, "max": 5.0, "step": 0.05}),
                "start_at": ("FLOAT", {"default": 0.0, "min": 0.0, "max": 1.0, "step": 0.001}),
                "end_at": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.001}),
            }
        }

    RETURN_TYPES = ("MODEL",)
    FUNCTION = "patch"

    CATEGORY = "advanced/model"

    def patch(self, model: comfy.model_patcher.ModelPatcher, block, conditioning=None, weight=1.0, start_at=0.0, end_at=1.0):
        if conditioning is None:
            return (model,)

        m = model.clone()
        sigma_start = m.get_model_object("model_sampling").percent_to_sigma(start_at)
        sigma_end = m.get_model_object("model_sampling").percent_to_sigma(end_at)

        m.set_model_attn2_patch(build_patch({f"{block}": conditioning}, weight=weight, sigma_start=sigma_start, sigma_end=sigma_end))

        return (m,)

class AdvancedPromptInjection:
    @classmethod
    def INPUT_TYPES(s):
        return {
            "required": {
                "model": ("MODEL",),
            },
            "optional": {
                "locations": ("STRING", {"multiline": True, "default": "output:0,1.0\noutput:1,1.0"}),
                "conditioning": ("CONDITIONING",),
                "start_at": ("FLOAT", {"default": 0.0, "min": 0.0, "max": 1.0, "step": 0.001}),
                "end_at": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.001}),
            }
        }

    RETURN_TYPES = ("MODEL",)
    FUNCTION = "patch"

    CATEGORY = "advanced/model"

    def patch(self, model: comfy.model_patcher.ModelPatcher, locations: str, conditioning=None, start_at=0.0, end_at=1.0):
        if not conditioning:
            return (model,)

        m = model.clone()
        sigma_start = m.get_model_object("model_sampling").percent_to_sigma(start_at)
        sigma_end = m.get_model_object("model_sampling").percent_to_sigma(end_at)

        for line in locations.splitlines():
            line = line.strip().strip('\n')
            weight = 1.0
            if ',' in line:
                line, weight = line.split(',')
                line = line.strip()
                weight = float(weight)
            if line:
                m.set_model_attn2_patch(build_patch({f"{line}": conditioning}, weight=weight, sigma_start=sigma_start, sigma_end=sigma_end))

        return (m,)

NODE_CLASS_MAPPINGS = {
    "PromptInjection": PromptInjection,
    "SimplePromptInjection": SimplePromptInjection,
    "AdvancedPromptInjection": AdvancedPromptInjection,
    "SVDPromptInjection": SVDPromptInjection
}

NODE_DISPLAY_NAME_MAPPINGS = {
    "PromptInjection": "Attn2 Prompt Injection",
    "SimplePromptInjection": "Attn2 Prompt Injection (simple)",
    "AdvancedPromptInjection": "Attn2 Prompt Injection (advanced)",
    "SVDPromptInjection": "Attn2 SVD Prompt Injection"
}

Here are a bunch of SVD probe results for hidden input/outputs I meantioned above. I was mostly interesting in CLIPTextTransformer, CLIPVisionTransformer Edit: I did find a way to get clip conditioning working now with it to some extent, and additional guidance image inputs, will make a repo soon:

Added path: C:/Users/NewPC/Downloads for folder: checkpoints
config.json: 100%|████████████████████████████████████████████████████████████████████████| 4.19k/4.19k [00:00<?, ?B/s]
pytorch_model.bin: 100%|████████████████████████████████████████████████████████████| 605M/605M [00:48<00:00, 12.5MB/s]
C:\Users\NewPC\Downloads\venv sim\venv\lib\site-packages\torch\_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
preprocessor_config.json: 100%|███████████████████████████████████████████████████████████████| 316/316 [00:00<?, ?B/s]
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████| 592/592 [00:00<?, ?B/s]
vocab.json: 100%|███████████████████████████████████████████████████████████████████| 862k/862k [00:00<00:00, 5.51MB/s]
merges.txt: 100%|███████████████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 1.55MB/s]
tokenizer.json: 100%|█████████████████████████████████████████████████████████████| 2.22M/2.22M [00:01<00:00, 1.88MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████| 389/389 [00:00<?, ?B/s]
Loading checkpoint from path: C:/Users/NewPC/Downloads/svd_merge_with_motionctrl_50-2.safetensors
Method: add_module
  Input Types: N/A
  Return Type: N/A
  Error: module name should be a string. Got NoneType
Method: apply
  Input Types: N/A
  Return Type: N/A
  Error: 'NoneType' object is not callable
Method: bfloat16
  Input Types: {}
  Return Type: WrappedModel
  Output Sample: WrappedModel(
  (model): Module()
  (clip_model): CLIPModel(
    (text_model): CLIPTextTransformer(
      (embeddings): CLIPTextEmbeddings(
        (token_embedding): Embedding(49408, 512)
        (position_embedding): Embedding(77, 512)
      )
      (encoder): CLIPEncoder(
        (layers): ModuleList(
          (0-11): 12 x CLIPEncoderLayer(
            (self_attn): CLIPAttention(
              (k_proj): Linear(in_features=512, out_features=512, bias=True)
              (v_proj): Linear(in_features=512, out_features=512, bias=True)
              (q_proj): Linear(in_features=512, out_features=512, bias=True)
              (out_proj): Linear(in_features=512, out_features=512, bias=True)
            )
            (layer_norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
            (mlp): CLIPMLP(
              (activation_fn): QuickGELUActivation()
              (fc1): Linear(in_features=512, out_features=2048, bias=True)
              (fc2): Linear(in_features=2048, out_features=512, bias=True)
            )
            (layer_norm2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
          )
        )
      )
      (final_layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
    )
    (vision_model): CLIPVisionTransformer(
      (embeddings): CLIPVisionEmbeddings(
        (patch_embedding): Conv2d(3, 768, kernel_size=(32, 32), stride=(32, 32), bias=False)
        (position_embedding): Embedding(50, 768)
      )
      (pre_layrnorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (encoder): CLIPEncoder(
        (layers): ModuleList(
          (0-11): 12 x CLIPEncoderLayer(
            (self_attn): CLIPAttention(
              (k_proj): Linear(in_features=768, out_features=768, bias=True)
              (v_proj): Linear(in_features=768, out_features=768, bias=True)
              (q_proj): Linear(in_features=768, out_features=768, bias=True)
              (out_proj): Linear(in_features=768, out_features=768, bias=True)
            )
            (layer_norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (mlp): CLIPMLP(
              (activation_fn): QuickGELUActivation()
              (fc1): Linear(in_features=768, out_features=3072, bias=True)
              (fc2): Linear(in_features=3072, out_features=768, bias=True)
            )
            (layer_norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          )
        )
      )
      (post_layernorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    )
    (visual_projection): Linear(in_features=768, out_features=512, bias=False)
    (text_projection): Linear(in_features=512, out_features=512, bias=False)
  )
)
Method: buffers
  Input Types: {'recurse': 'NoneType'}
  Return Type: generator
  Output Sample: <generator object Module.buffers at 0x0000029E07A993F0>
Method: children
  Input Types: {}
  Return Type: generator
  Output Sample: <generator object Module.children at 0x0000029E07A99460>
Method: cpu
  Input Types: {}
  Return Type: WrappedModel
  Output Sample: WrappedModel(
  (model): Module()
  (clip_model): CLIPModel(
    (text_model): CLIPTextTransformer(
      (embeddings): CLIPTextEmbeddings(
        (token_embedding): Embedding(49408, 512)
        (position_embedding): Embedding(77, 512)
      )
      (encoder): CLIPEncoder(
        (layers): ModuleList(
          (0-11): 12 x CLIPEncoderLayer(
            (self_attn): CLIPAttention(
              (k_proj): Linear(in_features=512, out_features=512, bias=True)
              (v_proj): Linear(in_features=512, out_features=512, bias=True)
              (q_proj): Linear(in_features=512, out_features=512, bias=True)
              (out_proj): Linear(in_features=512, out_features=512, bias=True)
            )
            (layer_norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
            (mlp): CLIPMLP(
              (activation_fn): QuickGELUActivation()
              (fc1): Linear(in_features=512, out_features=2048, bias=True)
              (fc2): Linear(in_features=2048, out_features=512, bias=True)
            )
            (layer_norm2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
          )
        )
      )
      (final_layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
    )
    (vision_model): CLIPVisionTransformer(
      (embeddings): CLIPVisionEmbeddings(
        (patch_embedding): Conv2d(3, 768, kernel_size=(32, 32), stride=(32, 32), bias=False)
        (position_embedding): Embedding(50, 768)
      )
      (pre_layrnorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (encoder): CLIPEncoder(
        (layers): ModuleList(
          (0-11): 12 x CLIPEncoderLayer(
            (self_attn): CLIPAttention(
              (k_proj): Linear(in_features=768, out_features=768, bias=True)
              (v_proj): Linear(in_features=768, out_features=768, bias=True)
              (q_proj): Linear(in_features=768, out_features=768, bias=True)
              (out_proj): Linear(in_features=768, out_features=768, bias=True)
            )
            (layer_norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (mlp): CLIPMLP(
              (activation_fn): QuickGELUActivation()
              (fc1): Linear(in_features=768, out_features=3072, bias=True)
              (fc2): Linear(in_features=3072, out_features=768, bias=True)
            )
            (layer_norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          )
        )
      )
      (post_layernorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    )
    (visual_projection): Linear(in_features=768, out_features=512, bias=False)
    (text_projection): Linear(in_features=512, out_features=512, bias=False)
  )
)
Method: cuda
  Input Types: {'device': 'NoneType'}
  Return Type: WrappedModel
  Output Sample: WrappedModel(
  (model): Module()
  (clip_model): CLIPModel(
    (text_model): CLIPTextTransformer(
      (embeddings): CLIPTextEmbeddings(
        (token_embedding): Embedding(49408, 512)
        (position_embedding): Embedding(77, 512)
      )
      (encoder): CLIPEncoder(
        (layers): ModuleList(
          (0-11): 12 x CLIPEncoderLayer(
            (self_attn): CLIPAttention(
              (k_proj): Linear(in_features=512, out_features=512, bias=True)
              (v_proj): Linear(in_features=512, out_features=512, bias=True)
              (q_proj): Linear(in_features=512, out_features=512, bias=True)
              (out_proj): Linear(in_features=512, out_features=512, bias=True)
            )
            (layer_norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
            (mlp): CLIPMLP(
              (activation_fn): QuickGELUActivation()
              (fc1): Linear(in_features=512, out_features=2048, bias=True)
              (fc2): Linear(in_features=2048, out_features=512, bias=True)
            )
            (layer_norm2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
          )
        )
      )
      (final_layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
    )
    (vision_model): CLIPVisionTransformer(
      (embeddings): CLIPVisionEmbeddings(
        (patch_embedding): Conv2d(3, 768, kernel_size=(32, 32), stride=(32, 32), bias=False)
        (position_embedding): Embedding(50, 768)
      )
      (pre_layrnorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (encoder): CLIPEncoder(
        (layers): ModuleList(
          (0-11): 12 x CLIPEncoderLayer(
            (self_attn): CLIPAttention(
              (k_proj): Linear(in_features=768, out_features=768, bias=True)
              (v_proj): Linear(in_features=768, out_features=768, bias=True)
              (q_proj): Linear(in_features=768, out_features=768, bias=True)
              (out_proj): Linear(in_features=768, out_features=768, bias=True)
            )
            (layer_norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (mlp): CLIPMLP(
              (activation_fn): QuickGELUActivation()
              (fc1): Linear(in_features=768, out_features=3072, bias=True)
              (fc2): Linear(in_features=3072, out_features=768, bias=True)
            )
            (layer_norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          )
        )
      )
      (post_layernorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    )
    (visual_projection): Linear(in_features=768, out_features=512, bias=False)
    (text_projection): Linear(in_features=512, out_features=512, bias=False)
  )
)
Method: double
  Input Types: {}
  Return Type: WrappedModel
  Output Sample: WrappedModel(
  (model): Module()
  (clip_model): CLIPModel(
    (text_model): CLIPTextTransformer(
      (embeddings): CLIPTextEmbeddings(
        (token_embedding): Embedding(49408, 512)
        (position_embedding): Embedding(77, 512)
      )
      (encoder): CLIPEncoder(
        (layers): ModuleList(
          (0-11): 12 x CLIPEncoderLayer(
            (self_attn): CLIPAttention(
              (k_proj): Linear(in_features=512, out_features=512, bias=True)
              (v_proj): Linear(in_features=512, out_features=512, bias=True)
              (q_proj): Linear(in_features=512, out_features=512, bias=True)
              (out_proj): Linear(in_features=512, out_features=512, bias=True)
            )
            (layer_norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
            (mlp): CLIPMLP(
              (activation_fn): QuickGELUActivation()
              (fc1): Linear(in_features=512, out_features=2048, bias=True)
              (fc2): Linear(in_features=2048, out_features=512, bias=True)
            )
            (layer_norm2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
          )
        )
      )
      (final_layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
    )
    (vision_model): CLIPVisionTransformer(
      (embeddings): CLIPVisionEmbeddings(
        (patch_embedding): Conv2d(3, 768, kernel_size=(32, 32), stride=(32, 32), bias=False)
        (position_embedding): Embedding(50, 768)
      )
      (pre_layrnorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (encoder): CLIPEncoder(
        (layers): ModuleList(
          (0-11): 12 x CLIPEncoderLayer(
            (self_attn): CLIPAttention(
              (k_proj): Linear(in_features=768, out_features=768, bias=True)
              (v_proj): Linear(in_features=768, out_features=768, bias=True)
              (q_proj): Linear(in_features=768, out_features=768, bias=True)
              (out_proj): Linear(in_features=768, out_features=768, bias=True)
            )
            (layer_norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (mlp): CLIPMLP(
              (activation_fn): QuickGELUActivation()
              (fc1): Linear(in_features=768, out_features=3072, bias=True)
              (fc2): Linear(in_features=3072, out_features=768, bias=True)
            )
            (layer_norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          )
        )
      )
      (post_layernorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    )
    (visual_projection): Linear(in_features=768, out_features=512, bias=False)
    (text_projection): Linear(in_features=512, out_features=512, bias=False)
  )
)
Method: eval
  Input Types: {}
  Return Type: WrappedModel
  Output Sample: WrappedModel(
  (model): Module()
  (clip_model): CLIPModel(
    (text_model): CLIPTextTransformer(
      (embeddings): CLIPTextEmbeddings(
        (token_embedding): Embedding(49408, 512)
        (position_embedding): Embedding(77, 512)
      )
      (encoder): CLIPEncoder(
        (layers): ModuleList(
          (0-11): 12 x CLIPEncoderLayer(
            (self_attn): CLIPAttention(
              (k_proj): Linear(in_features=512, out_features=512, bias=True)
              (v_proj): Linear(in_features=512, out_features=512, bias=True)
              (q_proj): Linear(in_features=512, out_features=512, bias=True)
              (out_proj): Linear(in_features=512, out_features=512, bias=True)
            )
            (layer_norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
            (mlp): CLIPMLP(
              (activation_fn): QuickGELUActivation()
              (fc1): Linear(in_features=512, out_features=2048, bias=True)
              (fc2): Linear(in_features=2048, out_features=512, bias=True)
            )
            (layer_norm2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
          )
        )
      )
      (final_layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
    )
    (vision_model): CLIPVisionTransformer(
      (embeddings): CLIPVisionEmbeddings(
        (patch_embedding): Conv2d(3, 768, kernel_size=(32, 32), stride=(32, 32), bias=False)
        (position_embedding): Embedding(50, 768)
      )
      (pre_layrnorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (encoder): CLIPEncoder(
        (layers): ModuleList(
          (0-11): 12 x CLIPEncoderLayer(
            (self_attn): CLIPAttention(
              (k_proj): Linear(in_features=768, out_features=768, bias=True)
              (v_proj): Linear(in_features=768, out_features=768, bias=True)
              (q_proj): Linear(in_features=768, out_features=768, bias=True)
              (out_proj): Linear(in_features=768, out_features=768, bias=True)
            )
            (layer_norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (mlp): CLIPMLP(
              (activation_fn): QuickGELUActivation()
              (fc1): Linear(in_features=768, out_features=3072, bias=True)
              (fc2): Linear(in_features=3072, out_features=768, bias=True)
            )
            (layer_norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          )
        )
      )
      (post_layernorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    )
    (visual_projection): Linear(in_features=768, out_features=512, bias=False)
    (text_projection): Linear(in_features=512, out_features=512, bias=False)
  )
)
Method: extra_repr
  Input Types: {}
  Return Type: str
  Output Sample:
Method: float
  Input Types: {}
  Return Type: WrappedModel
  Output Sample: WrappedModel(
  (model): Module()
  (clip_model): CLIPModel(
    (text_model): CLIPTextTransformer(
      (embeddings): CLIPTextEmbeddings(
        (token_embedding): Embedding(49408, 512)
        (position_embedding): Embedding(77, 512)
      )
      (encoder): CLIPEncoder(
        (layers): ModuleList(
          (0-11): 12 x CLIPEncoderLayer(
            (self_attn): CLIPAttention(
              (k_proj): Linear(in_features=512, out_features=512, bias=True)
              (v_proj): Linear(in_features=512, out_features=512, bias=True)
              (q_proj): Linear(in_features=512, out_features=512, bias=True)
              (out_proj): Linear(in_features=512, out_features=512, bias=True)
            )
            (layer_norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
            (mlp): CLIPMLP(
              (activation_fn): QuickGELUActivation()
              (fc1): Linear(in_features=512, out_features=2048, bias=True)
              (fc2): Linear(in_features=2048, out_features=512, bias=True)
            )
            (layer_norm2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
          )
        )
      )
      (final_layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
    )
    (vision_model): CLIPVisionTransformer(
      (embeddings): CLIPVisionEmbeddings(
        (patch_embedding): Conv2d(3, 768, kernel_size=(32, 32), stride=(32, 32), bias=False)
        (position_embedding): Embedding(50, 768)
      )
      (pre_layrnorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (encoder): CLIPEncoder(
        (layers): ModuleList(
          (0-11): 12 x CLIPEncoderLayer(
            (self_attn): CLIPAttention(
              (k_proj): Linear(in_features=768, out_features=768, bias=True)
              (v_proj): Linear(in_features=768, out_features=768, bias=True)
              (q_proj): Linear(in_features=768, out_features=768, bias=True)
              (out_proj): Linear(in_features=768, out_features=768, bias=True)
            )
            (layer_norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (mlp): CLIPMLP(
              (activation_fn): QuickGELUActivation()
              (fc1): Linear(in_features=768, out_features=3072, bias=True)
              (fc2): Linear(in_features=3072, out_features=768, bias=True)
            )
            (layer_norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          )
        )
      )
      (post_layernorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    )
    (visual_projection): Linear(in_features=768, out_features=512, bias=False)
    (text_projection): Linear(in_features=512, out_features=512, bias=False)
  )
)
Method: forward
  Input Types: N/A
  Return Type: N/A
  Error: The image to be converted to a PIL image contains values outside the range [0, 1], got [-4.921847820281982, 4.285804271697998] which cannot be converted to uint8.
Method: get_buffer
  Input Types: N/A
  Return Type: N/A
  Error: 'NoneType' object has no attribute 'rpartition'
Method: get_extra_state
  Input Types: N/A
  Return Type: N/A
  Error: Reached a code path in Module.get_extra_state() that should never be called. Please file an issue at https://github.com/pytorch/pytorch/issues/new?template=bug-report.yml to report this bug.
Method: get_parameter
  Input Types: N/A
  Return Type: N/A
  Error: 'NoneType' object has no attribute 'rpartition'
Method: get_submodule
  Input Types: N/A
  Return Type: N/A
  Error: 'NoneType' object has no attribute 'split'
Method: half
  Input Types: {}
  Return Type: WrappedModel
  Output Sample: WrappedModel(
  (model): Module()
  (clip_model): CLIPModel(
    (text_model): CLIPTextTransformer(
      (embeddings): CLIPTextEmbeddings(
        (token_embedding): Embedding(49408, 512)
        (position_embedding): Embedding(77, 512)
      )
      (encoder): CLIPEncoder(
        (layers): ModuleList(
          (0-11): 12 x CLIPEncoderLayer(
            (self_attn): CLIPAttention(
              (k_proj): Linear(in_features=512, out_features=512, bias=True)
              (v_proj): Linear(in_features=512, out_features=512, bias=True)
              (q_proj): Linear(in_features=512, out_features=512, bias=True)
              (out_proj): Linear(in_features=512, out_features=512, bias=True)
            )
            (layer_norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
            (mlp): CLIPMLP(
              (activation_fn): QuickGELUActivation()
              (fc1): Linear(in_features=512, out_features=2048, bias=True)
              (fc2): Linear(in_features=2048, out_features=512, bias=True)
            )
            (layer_norm2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
          )
        )
      )
      (final_layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
    )
    (vision_model): CLIPVisionTransformer(
      (embeddings): CLIPVisionEmbeddings(
        (patch_embedding): Conv2d(3, 768, kernel_size=(32, 32), stride=(32, 32), bias=False)
        (position_embedding): Embedding(50, 768)
      )
      (pre_layrnorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (encoder): CLIPEncoder(
        (layers): ModuleList(
          (0-11): 12 x CLIPEncoderLayer(
            (self_attn): CLIPAttention(
              (k_proj): Linear(in_features=768, out_features=768, bias=True)
              (v_proj): Linear(in_features=768, out_features=768, bias=True)
              (q_proj): Linear(in_features=768, out_features=768, bias=True)
              (out_proj): Linear(in_features=768, out_features=768, bias=True)
            )
            (layer_norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (mlp): CLIPMLP(
              (activation_fn): QuickGELUActivation()
              (fc1): Linear(in_features=768, out_features=3072, bias=True)
              (fc2): Linear(in_features=3072, out_features=768, bias=True)
            )
            (layer_norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          )
        )
      )
      (post_layernorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    )
    (visual_projection): Linear(in_features=768, out_features=512, bias=False)
    (text_projection): Linear(in_features=512, out_features=512, bias=False)
  )
)
Method: ipu
  Input Types: N/A
  Return Type: N/A
  Error: PyTorch is not linked with support for ipu devices
Method: load_state_dict
  Input Types: N/A
  Return Type: N/A
  Error: Expected state_dict to be dict-like, got <class 'NoneType'>.
Method: modules
  Input Types: {}
  Return Type: generator
  Output Sample: <generator object Module.modules at 0x0000029E07A994D0>
Method: named_buffers
  Input Types: {'prefix': 'NoneType', 'recurse': 'NoneType', 'remove_duplicate': 'NoneType'}
  Return Type: generator
  Output Sample: <generator object Module.named_buffers at 0x0000029E07A995B0>
Method: named_children
  Input Types: {}
  Return Type: generator
  Output Sample: <generator object Module.named_children at 0x0000029E07A99620>
Method: named_modules
  Input Types: {'memo': 'NoneType', 'prefix': 'NoneType', 'remove_duplicate': 'NoneType'}
  Return Type: generator
  Output Sample: <generator object Module.named_modules at 0x0000029E07A99690>
Method: named_parameters
  Input Types: {'prefix': 'NoneType', 'recurse': 'NoneType', 'remove_duplicate': 'NoneType'}
  Return Type: generator
  Output Sample: <generator object Module.named_parameters at 0x0000029E07A99700>
Method: parameters
  Input Types: {'recurse': 'NoneType'}
  Return Type: generator
  Output Sample: <generator object Module.parameters at 0x0000029E07A997E0>
Method: register_backward_hook
  Input Types: {'hook': 'NoneType'}
  Return Type: RemovableHandle
  Output Sample: <torch.utils.hooks.RemovableHandle object at 0x0000029DFCD4B790>
Method: register_buffer
  Input Types: N/A
  Return Type: N/A
  Error: buffer name should be a string. Got NoneType
Method: register_forward_hook
  Input Types: {'hook': 'NoneType', 'prepend': 'NoneType', 'with_kwargs': 'NoneType'}
  Return Type: RemovableHandle
  Output Sample: <torch.utils.hooks.RemovableHandle object at 0x0000029E06913A30>
Method: register_forward_pre_hook
  Input Types: {'hook': 'NoneType', 'prepend': 'NoneType', 'with_kwargs': 'NoneType'}
  Return Type: RemovableHandle
  Output Sample: <torch.utils.hooks.RemovableHandle object at 0x0000029E06913850>
Method: register_full_backward_hook
  Input Types: N/A
  Return Type: N/A
  Error: Cannot use both regular backward hooks and full backward hooks on a single Module. Please use only one of them.
Method: register_full_backward_pre_hook
  Input Types: {'hook': 'NoneType', 'prepend': 'NoneType'}
  Return Type: RemovableHandle
  Output Sample: <torch.utils.hooks.RemovableHandle object at 0x0000029E06913A90>
Method: register_load_state_dict_post_hook
  Input Types: {'hook': 'NoneType'}
  Return Type: RemovableHandle
  Output Sample: <torch.utils.hooks.RemovableHandle object at 0x0000029E06911720>
Method: register_module
  Input Types: N/A
  Return Type: N/A
  Error: module name should be a string. Got NoneType
Method: register_parameter
  Input Types: N/A
  Return Type: N/A
  Error: parameter name should be a string. Got NoneType
Method: register_state_dict_pre_hook
  Input Types: {'hook': 'NoneType'}
  Return Type: RemovableHandle
  Output Sample: <torch.utils.hooks.RemovableHandle object at 0x0000029E06913CD0>
Method: requires_grad_
  Input Types: N/A
  Return Type: N/A
  Error: requires_grad_(): argument 'requires_grad' (position 1) must be bool, not NoneType
Method: set_extra_state
  Input Types: N/A
  Return Type: N/A
  Error: Reached a code path in Module.set_extra_state() that should never be called. Please file an issue at https://github.com/pytorch/pytorch/issues/new?template=bug-report.yml to report this bug.
Method: share_memory
  Input Types: {}
  Return Type: WrappedModel
  Output Sample: WrappedModel(
  (model): Module()
  (clip_model): CLIPModel(
    (text_model): CLIPTextTransformer(
      (embeddings): CLIPTextEmbeddings(
        (token_embedding): Embedding(49408, 512)
        (position_embedding): Embedding(77, 512)
      )
      (encoder): CLIPEncoder(
        (layers): ModuleList(
          (0-11): 12 x CLIPEncoderLayer(
            (self_attn): CLIPAttention(
              (k_proj): Linear(in_features=512, out_features=512, bias=True)
              (v_proj): Linear(in_features=512, out_features=512, bias=True)
              (q_proj): Linear(in_features=512, out_features=512, bias=True)
              (out_proj): Linear(in_features=512, out_features=512, bias=True)
            )
            (layer_norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
            (mlp): CLIPMLP(
              (activation_fn): QuickGELUActivation()
              (fc1): Linear(in_features=512, out_features=2048, bias=True)
              (fc2): Linear(in_features=2048, out_features=512, bias=True)
            )
            (layer_norm2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
          )
        )
      )
      (final_layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
    )
    (vision_model): CLIPVisionTransformer(
      (embeddings): CLIPVisionEmbeddings(
        (patch_embedding): Conv2d(3, 768, kernel_size=(32, 32), stride=(32, 32), bias=False)
        (position_embedding): Embedding(50, 768)
      )
      (pre_layrnorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (encoder): CLIPEncoder(
        (layers): ModuleList(
          (0-11): 12 x CLIPEncoderLayer(
            (self_attn): CLIPAttention(
              (k_proj): Linear(in_features=768, out_features=768, bias=True)
              (v_proj): Linear(in_features=768, out_features=768, bias=True)
              (q_proj): Linear(in_features=768, out_features=768, bias=True)
              (out_proj): Linear(in_features=768, out_features=768, bias=True)
            )
            (layer_norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (mlp): CLIPMLP(
              (activation_fn): QuickGELUActivation()
              (fc1): Linear(in_features=768, out_features=3072, bias=True)
              (fc2): Linear(in_features=3072, out_features=768, bias=True)
            )
            (layer_norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          )
        )
      )
      (post_layernorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    )
    (visual_projection): Linear(in_features=768, out_features=512, bias=False)
    (text_projection): Linear(in_features=512, out_features=512, bias=False)
  )
)
Method: state_dict
  Input Types: N/A
  Return Type: N/A
  Error: Module.state_dict() got an unexpected keyword argument 'args'
Method: to
  Input Types: N/A
  Return Type: N/A
  Error: to() received an invalid combination of arguments - got (args=NoneType, kwargs=NoneType, ), but expected one of:
 * (torch.device device, torch.dtype dtype, bool non_blocking, bool copy, *, torch.memory_format memory_format)
 * (torch.dtype dtype, bool non_blocking, bool copy, *, torch.memory_format memory_format)
 * (Tensor tensor, bool non_blocking, bool copy, *, torch.memory_format memory_format)

Method: to_empty
  Input Types: {'device': 'NoneType'}
  Return Type: WrappedModel
  Output Sample: WrappedModel(
  (model): Module()
  (clip_model): CLIPModel(
    (text_model): CLIPTextTransformer(
      (embeddings): CLIPTextEmbeddings(
        (token_embedding): Embedding(49408, 512)
        (position_embedding): Embedding(77, 512)
      )
      (encoder): CLIPEncoder(
        (layers): ModuleList(
          (0-11): 12 x CLIPEncoderLayer(
            (self_attn): CLIPAttention(
              (k_proj): Linear(in_features=512, out_features=512, bias=True)
              (v_proj): Linear(in_features=512, out_features=512, bias=True)
              (q_proj): Linear(in_features=512, out_features=512, bias=True)
              (out_proj): Linear(in_features=512, out_features=512, bias=True)
            )
            (layer_norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
            (mlp): CLIPMLP(
              (activation_fn): QuickGELUActivation()
              (fc1): Linear(in_features=512, out_features=2048, bias=True)
              (fc2): Linear(in_features=2048, out_features=512, bias=True)
            )
            (layer_norm2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
          )
        )
      )
      (final_layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
    )
    (vision_model): CLIPVisionTransformer(
      (embeddings): CLIPVisionEmbeddings(
        (patch_embedding): Conv2d(3, 768, kernel_size=(32, 32), stride=(32, 32), bias=False)
        (position_embedding): Embedding(50, 768)
      )
      (pre_layrnorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (encoder): CLIPEncoder(
        (layers): ModuleList(
          (0-11): 12 x CLIPEncoderLayer(
            (self_attn): CLIPAttention(
              (k_proj): Linear(in_features=768, out_features=768, bias=True)
              (v_proj): Linear(in_features=768, out_features=768, bias=True)
              (q_proj): Linear(in_features=768, out_features=768, bias=True)
              (out_proj): Linear(in_features=768, out_features=768, bias=True)
            )
            (layer_norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (mlp): CLIPMLP(
              (activation_fn): QuickGELUActivation()
              (fc1): Linear(in_features=768, out_features=3072, bias=True)
              (fc2): Linear(in_features=3072, out_features=768, bias=True)
            )
            (layer_norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          )
        )
      )
      (post_layernorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    )
    (visual_projection): Linear(in_features=768, out_features=512, bias=False)
    (text_projection): Linear(in_features=512, out_features=512, bias=False)
  )
)
Method: train
  Input Types: N/A
  Return Type: N/A
  Error: training mode is expected to be boolean
Method: type
  Input Types: N/A
  Return Type: N/A
  Error: _has_compatible_shallow_copy_type(): argument 'from' (position 2) must be Tensor, not str
Method: xpu
  Input Types: N/A
  Return Type: N/A
  Error: PyTorch is not linked with support for xpu devices
Method: zero_grad
  Input Types: {'set_to_none': 'NoneType'}
  Return Type: NoneType
  Output Sample: None

(venv) C:\Users\NewPC\Downloads\venv sim>

2nd PROBE

Current Probe Results

The current probe provides a detailed breakdown of the results, revealing both successful and failed method executions: Successful Methods

Methods that executed successfully include add_module, apply, bfloat16, buffers, children, cpu, cuda, double, eval, extra_repr, float, forward, get_submodule, half, ipu, modules, named_buffers, named_children, named_modules, named_parameters, parameters, register_backward_hook, register_forward_hook, register_forward_pre_hook, register_full_backward_pre_hook, register_load_state_dict_post_hook, register_module, register_state_dict_pre_hook, requires_grad_, share_memory, to, train, type, xpu, and zero_grad.

Error-Prone Methods

Common Errors:
    NoneType attribute errors: Methods like get_buffer, get_parameter, get_submodule, register_buffer, and register_parameter failed due to encountering NoneType attributes.
    Specific argument errors: Methods like register_full_backward_hook and set_extra_state failed due to issues with method-specific arguments.
    Internal errors: Methods like load_state_dict, state_dict, and to_empty encountered internal errors related to unexpected keyword arguments or invalid combination of arguments.

Improving the Probing Script

To enhance the probing script further, consider the following updates:

Default Values for Specific Arguments: Ensure that the methods requiring specific argument types are provided with appropriate default values.
Enhanced Error Handling: Add more descriptive error messages and handle specific exceptions gracefully.

Here is an updated function to handle method-specific arguments more effectively:

4th PROBE From the latest probe execution, we can extract and summarize several key learnings and issues: Successful Method Probes:

Methods without Arguments:
    bfloat16, cpu, cuda, double, eval, float, half, ipu, share_memory, train, type, to_empty, xpu, zero_grad: These methods were successfully executed and returned a WrappedModel instance or NoneType where appropriate.
    buffers, children, modules, named_buffers, named_children, named_modules, named_parameters, parameters: These methods returned generator objects successfully, indicating they can iterate over model components.

Partially Successful Method Probes:

Methods with Simple Arguments:
    register_backward_hook, register_buffer, register_forward_hook, register_forward_pre_hook, register_full_backward_hook, register_full_backward_pre_hook, register_load_state_dict_post_hook, register_module, register_parameter, register_state_dict_pre_hook: These methods returned RemovableHandle objects or similar, indicating they can accept simple callable arguments.

Error-Inducing Methods:

Methods Requiring Specific Arguments:
    add_module, apply, get_buffer, get_extra_state, get_parameter, get_submodule, load_state_dict, register_buffer, set_extra_state, state_dict, to: These methods failed due to missing or inappropriate arguments.
    train: This method expected a boolean argument.
    forward: Although successfully executed with given inputs, the initial probe indicated a possible mismatch in expected input sizes or missing parameters.

Key Findings from forward Method:

Forward Method Outputs:
    Successfully executed with specific input types (e.g., init_image: Tensor, width: int, height: int, video_frames: int, motion_bucket_id: int, fps: int, augmentation_level: float).
    Returned a dictionary containing 'positive', 'negative', and 'latent' tensors, confirming the primary functionality and output structure.

Next Steps and Recommendations:

Provide Appropriate Arguments:
    For methods like add_module, apply, get_buffer, etc., determine and provide appropriate arguments. Default test values or actual data inputs can be used to handle these methods correctly.

Enhance Probing Script:
    Update the probing script to handle methods with specific arguments and provide default values where necessary. Improve error handling to offer more specific messages.
    Example update for handling method arguments:

Potential Holes and Additional Inputs:

To fully explore the potential holes in the model and test additional inputs, consider the following:

Additional Input Types:
    Based on common practices in video generation models, incorporate more varied input types such as:
        noise: A tensor representing Gaussian noise.
        motion_vector: A tensor representing motion vectors in the video frames.
        style_transfer: A tensor for style transfer operations.
        keyframes: A tensor for keyframe extraction and interpolation.
        depth_map: A tensor representing depth information of the video frames.
        semantic_map: A tensor representing semantic segmentation maps.

Testing Return Types:
    Extend the testing to check the return types from various methods and ensure the correct handling of outputs like:
        CONDITIONING: Ensure the output is correctly conditioned.
        LATENT: Verify the latent space representation is accurate.
        FEATURES: Check for additional feature maps or tensors.

Based on the findings from your recent script execution, we have gained a clearer understanding of the model's methods and their input-output types. Here is a summary of the new insights compared to previous findings:

Successful Method Probes:
    bfloat16, cpu, cuda, double, eval, float, half, ipu, requires_grad_, share_memory, to, train, xpu, zero_grad:
        These methods return a WrappedModel instance, indicating they can be successfully executed without additional parameters. This was a confirmation of their utility in model manipulation and evaluation contexts.

Partially Successful Method Probes:
    buffers, children, modules, named_buffers, named_children, named_modules, named_parameters, parameters:
        These methods return a generator object, which is useful for iterating over different components of the model.

Error-Inducing Methods:
    Several methods like add_module, apply, get_buffer, get_extra_state, get_parameter, get_submodule, load_state_dict, register_*, set_extra_state, type, to_empty:
        These methods require additional arguments and cannot be executed with default or empty parameters. They are crucial for more advanced operations but need specific inputs.

Insights from forward Method:
    The forward method was successfully executed with specific input types and returned a dictionary containing 'positive', 'negative', and 'latent' tensors. This confirms the primary functionality of your model in terms of its output structure.

Model Summary:
    The model summary indicated an output shape of [16, 4, 72, 128] for the WrappedModel, with the total estimated size of the model being 7.08 MB. This suggests that the model is relatively lightweight in terms of memory usage.

Error Handling Improvements:
    The script successfully catches and prints errors for methods that require additional parameters, providing a clearer path for debugging and further development.

Next Steps:

Addressing Errors in Probed Methods:
    For methods like add_module, apply, get_buffer, etc., you need to provide appropriate parameters when calling these methods. Consider defining these parameters or setting default test values to handle them correctly.

Extending Probing and Summarization:
    Extend the probing script to handle more complex scenarios, such as testing with actual data inputs rather than randomly generated tensors. This could provide deeper insights into the model's behavior under real-world conditions.

Further Debugging:
    Focus on the methods that failed due to missing arguments. Determine the necessary parameters and retry probing these methods with appropriate inputs.

Based on your latest probe execution, we have gathered the following insights:

Successful Method Probes:
    bfloat16, cpu, cuda, double, eval, float, half, ipu, requires_grad_, share_memory, to_empty, train, type, xpu, zero_grad:
        These methods return a WrappedModel instance, indicating they can be successfully executed without additional parameters.

Partially Successful Method Probes:
    buffers, children, modules, named_buffers, named_children, named_modules, named_parameters, parameters:
        These methods return a generator object, which is useful for iterating over different components of the model.

Error-Inducing Methods:
    Methods like add_module, apply, get_buffer, get_extra_state, get_parameter, get_submodule, load_state_dict, register_*, set_extra_state, state_dict, to:
        These methods require additional arguments and cannot be executed with default or empty parameters.

Insights from forward Method:
    The forward method was successfully executed with specific input types and returned a dictionary containing 'positive', 'negative', and 'latent' tensors. This confirms the primary functionality of your model in terms of its output structure.

Error Handling Improvements:
    The script successfully catches and prints errors for methods that require additional parameters, providing a clearer path for debugging and further development.

Next Steps:

Addressing Errors in Probed Methods:
    For methods like add_module, apply, get_buffer, etc., you need to provide appropriate parameters when calling these methods. Consider defining these parameters or setting default test values to handle them correctly.

Extending Probing and Summarization:
    Extend the probing script to handle more complex scenarios, such as testing with actual data inputs rather than randomly generated tensors. This could provide deeper insights into the model's behavior under real-world conditions.

Further Debugging:
    Focus on the methods that failed due to missing arguments. Determine the necessary parameters and retry probing these methods with appropriate inputs.

Update the Probing Script:
    Enhance the script to provide more specific error messages and handle various input types. Here’s an updated snippet for the probing script:
DataCTE commented 1 month ago

i will look more into this when i have more time. thank you for this detailed dive and work you put in! i added it to the node!

311-code commented 1 month ago

No problem, I think I made some progress with SD3 now at least but not really certain. Feel free to use any of the code I posted here if you find some of it useful. https://github.com/cubiq/prompt_injection/issues/12#issuecomment-2183170554

311-code commented 1 week ago

Forked here, the svd and sd3 injection I did get working and will post the code for that there soon. It was very difficult and still have doubts I'm doing sd3 right. https://github.com/brentjohnston/Magic-Prompt-Injection-SDXL-SD15