Closed DaeyangCho closed 1 week ago
Hi @DaeyangCho,
We were able to reproduce the problem and will see if we can implement a fix.
The reason that the compilation succeeds when the x = x + 1
line is removed is because Neuron is falling back to CPU since the graph is considered small. This is controlled by the minimum_segment_size
parameter which requires 2 operations by default in order to compile for Neuron. For example, the same failing behavior is observed when you set this parameter to 1:
model_neuron = torch.neuron.trace(model, example_inputs=[x, w], minimum_segment_size=1)
For more info on how operations are partitioned to CPU or Neuron, see the trace API docs: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuron/api-compilation-python-api.html
In particular, some arguments that control partitioning between CPU/Neuron devices are: fallback
, minimum_segment_size
, and single_fusion_ratio_threshold
Thanks for the kind reply! I'll see the trace API docs.
Hi @DaeyangCho, It looks like your issue has been addressed. Do you have any further questions? Otherwise, we will close this issue.
Thanks for the remind. I will close the issue.
Hi, I would like to share a problem that occurred while neuron compiling my model. For the torch.nn.functional.conv2d function, when the group value is 2 or more and the input channel value is 256 or more, the largest instruction counts log occurs and it takes too long to succeed compile. At the same time, an arithmetic operation must be added to the conv2d output tensor for the above phenomenon to occur.
If the group size is 1 or the input channel is small even if the ouput channel is large, the above phenomenon does not occur.
Here is the test sample code for reproduce.
The reason for raising this phenomenon as an issue is because I confirmed that compilation is much faster (and largest instruction counts log is disappeared!) when the group size is set to 1 and the input and weight tensor are used in chunks. I hope this report will help improve the neuron compiler.