Closed alejoGT1202 closed 2 years ago
Hello @alejoGT1202, We inspected your model and noticed that there are a few places with the YoloLayer operator that utilize in-place replacement, which limits our ability to effectively trace the model - this manifests as a WARNING in the logs.
# Problem in place code
#io[..., :2] = (io[..., :2] * 2. - 0.5 + self.grid)
#io[..., 2:4] = (io[..., 2:4] * 2) ** 2 * self.anchor_wh
#io[..., :4] *= self.stride
#return io.view(bs, -1, [self.no](http://self.no/)), p # view [1, 3, 13, 13, 85] as [1, 507, 85]
If you replace this with the following, we expect you'll see accurate results:
# Fixed non inline code
a = (io[..., :2] * 2. - 0.5 + self.grid)
b = (io[..., 2:4] * 2) ** 2 * self.anchor_wh
last_dim = len(a.shape) - 1
ab = torch.cat([a,b],dim=last_dim) * self.stride
out = torch.cat([ab,io[..., 4:]],dim=last_dim)
return out.view(bs, -1, [self.no](http://self.no/)), p # view [1, 3, 13, 13, 85] as [1, 507, 85]
More generally, I would encourage you to benchmark and profile your model - https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-tools/getting-started-tensorboard-neuron-plugin.html. Such profiling can expose places in your model where the hardware is not being utilized effectively.
I would also encourage you to ensure you're running the most recent version of Neuron software - our team is constantly working to improve operator support, compiled model quality, and profiling tools.
Resolving per suggested solution above. @alejoGT1202 , please feel free to re-open is we can help with anything else.
@aws-diamant @aws-taylor I was able to convert the model with the modification suggested. However, I'm not getting the same accuracy compared to the model on GPU. I tried different combinations for neuron-cc compile from the ones specified here. Is there any other approach I should try to get the same performance as the one that runs on GPU?
Thanks for the help.
Hello I'm trying to compile yolor_w6 so it can be used with the inf1 instances.
I inspected the model which gave me the following output:
I checked that all of the operators in the model are supported by neuron. However, while checking one of the tutorials it says that:
Inspecting the model, we discover that there are many aten::slice operations in some submodules called YoloLayer. Although these operations are supported by the neuron-cc compiler, they are not going to run efficiently on the Inferentia hardware
, and it makes me ask if there are some other operations from this nodel that do not run efficiently in the hardware?Thanks for the help