Code modifications in "pytorch_object_detector.py"

ShengYun-Peng commented 1 year ago

Is your feature request related to a problem? Please describe. In eval6, Armory is using "pytorch_object_detector.py" for the multi-object tracking scenario. Currently, a single input image is passed into the preprocessor under a for loop. However, our tracking defense relies on preprocessing multiple frames at one time, we have to move that chunk of code out of the for loop in "pytorch_object_detector.py" to make the preprocessor accept full video frames.

Describe the solution you'd like Replace https://github.com/Trusted-AI/adversarial-robustness-toolbox/blob/c226388533f0029d03f2c40bdd9a4e7b853e5f30/art/estimators/object_detection/pytorch_object_detector.py#L209-L228

with

if isinstance(x, np.ndarray):
    if self.clip_values is not None:
        x_grad = transform(x / self.clip_values[1]).to(self.device)
    else:
        x_grad = transform(x).to(self.device)
    x_grad.requires_grad = True
else:
    x_grad = x.to(self.device)
    if x_grad.shape[3] < x_grad.shape[1] and x_grad.shape[3] < x_grad.shape[2]:
        x_grad = torch.permute(x_grad, (0, 3, 1, 2))

image_tensor_list_grad.append(x_grad)
x_grad_1 = x_grad
x_preprocessed_i, y_preprocessed_i = self._apply_preprocessing(
    x_grad_1, y=y_tensor, fit=False, no_grad=False
)
for i_preprocessed in range(x_preprocessed_i.shape[0]):
    inputs_t.append(x_preprocessed_i[i_preprocessed])
    y_preprocessed.append(y_preprocessed_i[i_preprocessed])

Describe alternatives you've considered I tried to bypass this problem by not creating a PreprocessorPytorch, but the input image x is torch.tensor, so I cannot force the code into this if condition, which feeds all frames into the preprocessor.

Additional context N/A

beat-buesser commented 1 year ago

Hi @ShengYun-Peng Thank you very much for your suggestion! I'll take a closer look to get a better understanding and maybe come back with questions.

beat-buesser commented 1 year ago

Hi @ShengYun-Peng Please apologize my delayed response. Does this issue still persist? I tried to reproduce the issue but noticed that the mentioned for-loop is running over the batch-dimension and not the frame-dimension, therefore you should be able to access all frames in the preprocessor.

ShengYun-Peng commented 1 year ago

Hi @ShengYun-Peng Please apologize my delayed response. Does this issue still persist? I tried to reproduce the issue but noticed that the mentioned for-loop is running over the batch-dimension and not the frame-dimension, therefore you should be able to access all frames in the preprocessor.

The problem still exists. self._apply_preprocessing() is called on the frame level, whereas we hope to access all frames in this function call. Thus, we have to shift the chunk of code outside the for loop.

beat-buesser commented 1 year ago

@ShengYun-Peng In your experiment, what is the the shape of variables x and y just before line 209 in pytorch_object_detector.py (your code example above)?

ShengYun-Peng commented 1 year ago

I run the code with armory eval6 mot scenario configs, and output the shape of x in preprocessor. For benign case, x.shape -> [15, 860, 1280, 3], which is desired since the video has a total of 15 frames. However, x.shape -> [1, 3, 960, 1280] in adversarial case, and this is exactly the issue discussed above. Our preprocessor requires more than a single frame to perform defense.

beat-buesser commented 1 year ago

@ShengYun-Peng Interesting. The shape of x should be [1, 15, 860, 1280, 3], the first dimension is the batch dimension with the number samples. The loop should be looping of the each video samples but because the batch dimension is missing it loops over the frame dimension.

Can you check where Armory calls the method generate of ART's attack and check if the input samples have a batch dimension?

Trusted-AI / adversarial-robustness-toolbox

Code modifications in "pytorch_object_detector.py" #1943