aws-neuron / aws-neuron-sdk

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
https://aws.amazon.com/machine-learning/neuron/
Other
439 stars 142 forks source link

torch.neuron.trace issue - aten::repeat not compilable #411

Closed saraRaris closed 2 years ago

saraRaris commented 2 years ago

I'm having issues compiling a model that uses torch.repeat(). Although I'm able to trace the model, I'm only able to trace 95.9%. I'm constrained by FPS and therefore I would like to trace it completely.

The following code shows the tracing:

image = torch.zeros([1, 3, 384, 640], dtype=torch.float32)
model_neuron = torch.neuron.trace(model, example_inputs=[image])

Relevant logs:

INFO:Neuron:There are 8 ops of 1 different types in the TorchScript that are not compiled by neuron-cc: aten::repeat, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-pytorch.md)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 488, fused = 468, percent fused = 95.9%

I have tried to get rid of the first torch.repeat() in the working script and write the logics myself. However, this causes the tracing to be incorrect and I assume there are some values being hard coded since the output from the modified code is always the same. I'm unsure why this is happening. I have also tried to trace these two option using torch.jit.trace() and both of them result in working models.

Working script for 95.9% of compression:

                else:  # for YOLOv5 on AWS Inferentia https://github.com/ultralytics/yolov5/pull/2953
                    xy = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy
                    wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i].view(1, self.na, 1, 1, 2)  # wh
                    y = torch.cat((xy, wh, y[..., 4:]), -1)

                    if hasattr(self, 'num_coords') and self.num_coords:
                        ycoords = y[..., -self.num_coords:] * 4. - 2.
                        ycoords = ycoords * self.anchor_grid[i].repeat((1, 1, 1, 1, self.num_coords // 2))
                        ycoords = ycoords + (self.grid[i] * self.stride[i]).repeat((1, 1, 1, 1, self.num_coords // 2))
                        y = torch.cat((y[...,0:-34], ycoords), -1)

                z.append(y.view(bs, -1, self.no))

Modified script without the first torch.repeat() function:

                else:  # for YOLOv5 on AWS Inferentia https://github.com/ultralytics/yolov5/pull/2953
                    xy = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy
                    wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i].view(1, self.na, 1, 1, 2)  # wh
                    y = torch.cat((xy, wh, y[..., 4:]), -1)

                    if hasattr(self, 'num_coords') and self.num_coords:
                        ycoords = y[..., -self.num_coords:] * 4. - 2.
                        torch_0 = torch.cat(self.num_coords // 2*[self.anchor_grid[i][0][0][0][0]])
                        torch_1 = torch.cat(self.num_coords // 2*[self.anchor_grid[i][0][1][0][0]]
                        torch_2 = torch.cat(self.num_coords // 2*[self.anchor_grid[i][0][2][0][0]])
                        result_anchor = torch.zeros(1,3,1,1,34)
                        result_anchor[:,0,...] = torch_0
                        result_anchor[:,1,...] = torch_1
                        result_anchor[:,2,...] = torch_2
                        ycoords = ycoords * result_anchor

                        ycoords = ycoords + (self.grid[i] * self.stride[i]).repeat((1, 1, 1, 1, self.num_coords // 2))
                        y = torch.cat((y[...,0:-34], ycoords), -1)

                z.append(y.view(bs, -1, self.no))

Output of traced model using original script:

[array([[     743.05,      287.94,     0.71521],
       [     751.52,      273.12,      0.8363],
       [     730.34,      277.35,     0.74768],
       [     781.16,      285.82,      0.9263],
       [     721.88,      292.17,     0.16187],
       [     827.73,      351.45,     0.82941],
       [     709.17,      385.33,     0.89104],
       [     908.19,      417.09,     0.89882],
       [     696.47,      486.95,     0.86861],
       [     827.73,      461.55,     0.81253],
       [     692.23,      575.88,     0.83716],
       [      810.8,      550.47,     0.67558],
       [     738.81,      558.94,     0.73349],
       [      836.2,      698.67,     0.65628],
       [     772.69,      707.14,     0.62926],
       [     865.84,      757.56,    0.017876],
       [     797.56,      753.19,           0]])]

Output of traced model using modified script:

[array([[     755.75,      474.25,           0],
       [     755.75,      474.25,           0],
       [     755.75,      474.25,           0],
       [     755.75,      474.25,           0],
       [     755.75,      474.25,           0],
       [     755.75,      474.25,           0],
       [     755.75,      474.25,           0],
       [     755.75,      474.25,           0],
       [     755.75,      474.25,           0],
       [     755.75,      474.25,           0],
       [     755.75,      474.25,           0],
       [     755.75,      474.25,           0],
       [     755.75,      474.25,           0],
       [     755.75,      474.25,           0],
       [     755.75,      474.25,           0],
       [     755.75,      474.25,           0],
       [     755.75,      474.25,           0]])]

Additional info: Model: https://github.com/wmcnally/kapao

aws-diamant commented 2 years ago

Hi @saraRaris , and thanks for opening the ticket. We were able to reproduce the issue, and our team is working on a fix.

We will update as soon as we have better clarity on timelines.

mrnikwaws commented 2 years ago

Hi Sara,

First of all thanks for using Neuron and for the feedback on this model. There are three issues currently impacting it:

  1. We do have some support for aten::repeat, however because it does not work for all cases it is not on our supported list (and we partition it out by default in torch.neuron.trace). The good news on that front is the usage in this model is supported, so we can safely include the translation.
  2. Our compiler needs an SSA model graph, so in place operations (where we overwrite part of a tensor) are not good, and can cause numerical errors (warnings are generated for these). The good news is that this model mostly implements out of place operations, and it is small change to fix remaining issues.
  3. We've found an issue in our compiler for this model - unfortunately for now this is a blocker for compiling this model, however we understand the problem, and are working to get a fix out for a future release of torch-neuron

I'll focus first on point (2) - there is already some code in https://github.com/wmcnally/kapao/blob/master/models/yolo.py#L72 for out of place graphs from YOLOv5 from which this model derives - you just need to extend that branch with this code (or code like it):

  else:  # for YOLOv5 on AWS Inferentia https://github.com/ultralytics/yolov5/pull/2953
      xy = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy
      wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i].view(1, self.na, 1, 1, 2)  # wh
      y = torch.cat((xy, wh, y[..., 4:]), -1)

      if hasattr(self, 'num_coords') and self.num_coords:
          nc = y[..., -self.num_coords:] = y[..., -self.num_coords:] * 4. - 2.
          nc = nc * self.anchor_grid[i].repeat((1, 1, 1, 1, self.num_coords // 2))
          nc = nc + (self.grid[i] * self.stride[i]).repeat((1, 1, 1, 1, self.num_coords // 2))
          y = torch.cat((y[..., :-self.num_coords], nc), -1)

Secondly, let's look at how we can modify some of the test code from the package. Let's modify https://github.com/wmcnally/kapao/blob/master/demos/image.py

First modify the model constructor to use out of place operators at construction, changing https://github.com/wmcnally/kapao/blob/master/demos/image.py#L69

from:

model = attempt_load(args.weights, map_location=device)

to:

model = attempt_load(args.weights, inplace=False, map_location=device)

Second add a wrapper, to the model so that we will trace the model inference with the right arguments. Here we add code at https://github.com/wmcnally/kapao/blob/master/demos/image.py#L80

    # Wrapper class to set augment = True and other arguments
    class NeuronWrapper(torch.nn.Module):

        def __init__(self, model, data):
            super().__init__()
            self.model=model
            self._augment=True
            self.model.eval()
            self._data = data

        def forward(self, img):
            return self.model(
                img, 
                augment=True, 
                kp_flip=self._data['kp_flip'], 
                scales=self._data['scales'], 
                flips=self._data['flips'])

    # Create a neuron model file for the selected size
    neuron_model_file = "model_neuron_sz_{}.pt".format(data['imgsz'])

    # If the save file exists load it - otherwise compile
    if osp.exists(neuron_model_file):
        model_neuron = torch.jit.load(neuron_model_file)
    else:
        wrapper = NeuronWrapper(model, data)
        wrapper.eval()
        out = wrapper(img)
        ops = set(torch.neuron.get_supported_operations())

        # Force the aten::repeat lowering that works only in some cases into the operator list
        ops.add('aten::repeat')

        # This will fail for now until an upcoming release with a fix to the neuron-cc compiler
        model_neuron = torch.neuron.trace(
            wrapper, 
            img, 
            op_whitelist=ops, 
            strict=False)

        # Save the file to load next time
        model_neuron.save(neuron_model_file)

    start = time()
    out = model_neuron(img)
    end = time()

    print("took {} seconds".format(end-start))

    # The code we are replacing for reference
    #out = model(img, augment=True, kp_flip=data['kp_flip'], scales=data['scales'], flips=data['flips'])[0]

Usage looks something like:

# Outputs crowdpose_100024_kapao_l_coco_pose_kp_obj.png by default
python demos/image.py --pose --bbox --imgsz 640

which will use the default image in the project (some soccer players) - you can play with different options, I have tested with these ones.

We'll comment on this ticket again once the compiler changes are released.

mrnikwaws commented 2 years ago

Hi Sara,

I confirmed that current release works as outlined above in the current compiler release. If it is working for you let me know and we can close this issue, if I don't hear back we'll plan to close it in the next couple of days.