Open JohnRachid opened 1 year ago
I see other issues are getting responded to but not this one. Please let me know if there is any further information I can provide. I would love to get us setup on some INF1 instances
@JohnRachid - this one was missed. I'm taking a look now and if I can't figure out what is happening tonight, I'll sync with the team internally to get a better idea/response. Thanks for the patience.
Hello @JohnRachid,
I'm taking a look at this, but unfortunately the compilation code above is too redacted for me to be able to come up with a reproduction. Are you able to share the contents of /tmp/tmpbcm40hzx/model
with us? If this is sensitive then you can also reach us at aws-neuron-support@amazon.com.
-Taylor
Hello @aws-taylor,
I am working with @JohnRachid on this issue. Below is a reproducible example of the compiling error. Interestingly, the Attention model will compile if the patch embedding is removed and the patch embedding will compile on its own. However, when combined into a single model we receive the error given below. Furthermore, if the kernel size of the Conv2d is increased from (16,16) to (32,32) similar to issue 398 the combined model will compile. Thanks!
Conda Environment
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
absl-py 1.4.0 pypi_0 pypi
astor 0.8.1 pypi_0 pypi
attrs 23.1.0 pypi_0 pypi
ca-certificates 2023.05.30 h06a4308_0
certifi 2023.7.22 pypi_0 pypi
charset-normalizer 3.2.0 pypi_0 pypi
cycler 0.11.0 pypi_0 pypi
decorator 5.1.1 pypi_0 pypi
dmlc-nnvm 1.16.1.0+0 pypi_0 pypi
dmlc-topi 1.16.1.0+0 pypi_0 pypi
dmlc-tvm 1.16.1.0+0 pypi_0 pypi
exceptiongroup 1.1.2 pypi_0 pypi
filelock 3.12.2 pypi_0 pypi
fsspec 2023.1.0 pypi_0 pypi
gast 0.2.2 pypi_0 pypi
google-pasta 0.2.0 pypi_0 pypi
grpcio 1.56.2 pypi_0 pypi
h5py 3.8.0 pypi_0 pypi
huggingface-hub 0.16.4 pypi_0 pypi
idna 3.4 pypi_0 pypi
importlib-metadata 6.7.0 pypi_0 pypi
inferentia-hwm 1.14.4.0+a9fb5c73a pypi_0 pypi
iniconfig 2.0.0 pypi_0 pypi
islpy 2022.2.1 pypi_0 pypi
keras-applications 1.0.8 pypi_0 pypi
keras-preprocessing 1.1.2 pypi_0 pypi
kiwisolver 1.4.4 pypi_0 pypi
ld_impl_linux-64 2.38 h1181459_1
libffi 3.3 he6710b0_2
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
markdown 3.4.4 pypi_0 pypi
markupsafe 2.1.3 pypi_0 pypi
matplotlib 3.2.2 pypi_0 pypi
ncurses 6.4 h6a678d5_0
networkx 2.6.3 pypi_0 pypi
neuron-cc 1.17.0.0+1810fd7ed pypi_0 pypi
numpy 1.21.6 pypi_0 pypi
nvidia-cublas-cu11 11.10.3.66 pypi_0 pypi
nvidia-cuda-nvrtc-cu11 11.7.99 pypi_0 pypi
nvidia-cuda-runtime-cu11 11.7.99 pypi_0 pypi
nvidia-cudnn-cu11 8.5.0.96 pypi_0 pypi
opencv-python 4.5.1.48 pypi_0 pypi
openssl 1.1.1u h7f8727e_0
opt-einsum 3.3.0 pypi_0 pypi
packaging 23.1 pypi_0 pypi
pandas 1.3.5 pypi_0 pypi
pillow 9.5.0 pypi_0 pypi
pip 22.3.1 py37h06a4308_0
pluggy 1.2.0 pypi_0 pypi
protobuf 3.20.1 pypi_0 pypi
pyparsing 3.1.0 pypi_0 pypi
pytest 7.4.0 pypi_0 pypi
python 3.7.11 h12debd9_0
python-dateutil 2.8.2 pypi_0 pypi
pytz 2023.3 pypi_0 pypi
pyyaml 6.0.1 pypi_0 pypi
readline 8.2 h5eee18b_0
requests 2.31.0 pypi_0 pypi
safetensors 0.3.1 pypi_0 pypi
scipy 1.7.3 pypi_0 pypi
seaborn 0.12.2 pypi_0 pypi
setuptools 68.0.0 pypi_0 pypi
six 1.16.0 pypi_0 pypi
sqlite 3.41.2 h5eee18b_0
tensorboard 1.15.0 pypi_0 pypi
tensorflow 1.15.3 pypi_0 pypi
tensorflow-estimator 1.15.1 pypi_0 pypi
termcolor 2.3.0 pypi_0 pypi
timm 0.9.2 pypi_0 pypi
tk 8.6.12 h1ccaba5_0
tomli 2.0.1 pypi_0 pypi
torch 1.13.1 pypi_0 pypi
torch-neuron 1.13.1.2.8.9.0 pypi_0 pypi
torchvision 0.14.1 pypi_0 pypi
tqdm 4.65.0 pypi_0 pypi
typing-extensions 4.7.1 pypi_0 pypi
urllib3 2.0.4 pypi_0 pypi
werkzeug 2.2.3 pypi_0 pypi
wheel 0.41.0 pypi_0 pypi
wrapt 1.15.0 pypi_0 pypi
xz 5.4.2 h5eee18b_0
zipp 3.15.0 pypi_0 pypi
zlib 1.2.13 h5eee18b_0
Reproducible Example
import torch
import torch.neuron
class Attention(torch.nn.Module):
def __init__(
self, dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0.,
proj_drop=0., attn_head_dim=None, ):
super().__init__()
self.num_heads = num_heads
self.head_dim = dim // num_heads
self.dim = dim
if attn_head_dim is not None:
self.head_dim = attn_head_dim
self.all_head_dim = self.head_dim * self.num_heads
self.scale = self.head_dim ** -0.5
self.qkv = torch.nn.Linear(dim, self.all_head_dim * 3, bias=qkv_bias)
self.attn_drop = torch.nn.Dropout(attn_drop)
self.proj = torch.nn.Linear(self.all_head_dim, dim)
self.proj_drop = torch.nn.Dropout(proj_drop)
def forward(self, x):
B, N, C = x.shape
qkv = self.qkv(x)
qkv = qkv.reshape(B, N, 3, self.num_heads, -1).permute(2, 0, 3, 1, 4)
q, k, v = qkv[0], qkv[1], qkv[2]
q = q * self.scale
attn = (q @ k.transpose(-2, -1))
attn = attn.softmax(dim=-1)
attn = self.attn_drop(attn)
x = (attn @ v).transpose(1, 2).reshape(B, N, -1)
x = self.proj(x)
x = self.proj_drop(x)
return x
class Model(torch.nn.Module):
def __init__(self, dim=1024, num_heads=16, qkv_bias=False, qk_scale=None,
proj_drop=0., attn_drop=0., attn_head_dim=None, patch_size=16,
in_chans=3, internal_embedding=True
):
super(Model, self).__init__()
self.internal_embedding = internal_embedding
self.attn = Attention(
dim, num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale,
attn_drop=attn_drop, proj_drop=proj_drop, attn_head_dim=attn_head_dim
)
self.patch_embed = torch.nn.Conv2d(in_chans, dim, kernel_size=(patch_size, patch_size), stride=16, padding=2)
def forward(self, x):
x = self.patch_embed(x)
x = x.flatten(2)
x = x.transpose(1, 2)
x = self.attn(x)
return x
# load model
model = Model()
model.eval()
# get input representing an image of resolution 256x192
model_input = torch.zeros((1, 3, 256, 192))
# test model inference
pred = model(model_input)
# compile model
torch.neuron.trace(model, model_input)
Output
INFO:Neuron:All operators are compiled by neuron-cc (this does not guarantee that neuron-cc will successfully compile)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 25, fused = 25, percent fused = 100.0%
INFO:Neuron:Compiling function _NeuronGraph$26 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/USERNAME/anaconda3/envs/inf1_pose/bin/neuron-cc compile /tmp/tmpk0gh3biu/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpk0gh3biu/graph_def.neff --io-config {"inputs": {"0:0": [[1, 3, 256, 192], "float32"]}, "outputs": ["Attention_9/Linear_53/aten_linear/Add:0"]} --verbose 35'
........WARNING:Neuron:The neuron-cc (neuron compiler) process aborted (SIG_ABORT). This is likely due to an unexpected condition internally (a bug). Please lodge an issue at 'https://github.com/aws/aws-neuron-sdk/issues'
WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$26; falling back to native python function call
ERROR:Neuron:neuron-cc failed with the following command line call:
/home/USERNAME/anaconda3/envs/inf1_pose/bin/neuron-cc compile /tmp/tmpk0gh3biu/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpk0gh3biu/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 256, 192], "float32"]}, "outputs": ["Attention_9/Linear_53/aten_linear/Add:0"]}' --verbose 35
Traceback (most recent call last):
File "/home/USERNAME/anaconda3/envs/inf1_pose/lib/python3.7/site-packages/torch_neuron/convert.py", line 414, in op_converter
item, inputs, compiler_workdir=sg_workdir, **kwargs)
File "/home/USERNAME/anaconda3/envs/inf1_pose/lib/python3.7/site-packages/torch_neuron/decorators.py", line 264, in trace
'neuron-cc failed with the following command line call:\n{}'.format(command))
subprocess.SubprocessError: neuron-cc failed with the following command line call:
/home/USERNAME/anaconda3/envs/inf1_pose/bin/neuron-cc compile /tmp/tmpk0gh3biu/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpk0gh3biu/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 256, 192], "float32"]}, "outputs": ["Attention_9/Linear_53/aten_linear/Add:0"]}' --verbose 35
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 25, compiled = 0, percent compiled = 0.0%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 4 [supported]
INFO:Neuron: => aten::_convolution: 1 [supported]
INFO:Neuron: => aten::dropout: 2 [supported]
INFO:Neuron: => aten::flatten: 1 [supported]
INFO:Neuron: => aten::linear: 2 [supported]
INFO:Neuron: => aten::matmul: 2 [supported]
INFO:Neuron: => aten::mul: 1 [supported]
INFO:Neuron: => aten::permute: 1 [supported]
INFO:Neuron: => aten::reshape: 2 [supported]
INFO:Neuron: => aten::select: 3 [supported]
INFO:Neuron: => aten::size: 2 [supported]
INFO:Neuron: => aten::softmax: 1 [supported]
INFO:Neuron: => aten::transpose: 3 [supported]
Traceback (most recent call last):
File "/home/USERNAME/engineering/USERNAME_ml_pipeline/src/Pose/inf1_compiling/Reproducible_Example.py", line 75, in <module>
torch.neuron.trace(model, model_input)
File "/home/USERNAME/anaconda3/envs/inf1_pose/lib/python3.7/site-packages/torch_neuron/convert.py", line 217, in trace
cu.stats_post_compiler(neuron_graph)
File "/home/USERNAME/anaconda3/envs/inf1_pose/lib/python3.7/site-packages/torch_neuron/convert.py", line 531, in stats_post_compiler
"No operations were successfully partitioned and compiled to neuron for this model - aborting trace!")
RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace!
Process finished with exit code 1
Any updates on this? Were you able to reproduce the error with this new example?
We have reproduced the issue and have implemented a fix. The fix for this issue will be available in an upcoming release. We will update this ticket when the fix is available.
Any update on when the fix will be implemented? I have tested on the newest release (neuron 2.13.2 released 9/1/23) and the bug still exists. The minimal reproducible example I gave above still fails.
Hi messmor - The previous intended fix needs more work; sorry but we cannot commit to an ETA at this time.
This is very unfortunate. Looks like we will need to evaluate alternatives to these instances.
Hey everyone,
I'm hoping I can get some help with an error I am facing when compiling a model. I am trying to compile a model for use on INF1 instances. I have replicated this elsewhere, however, this example is for my local environment. One thing to note is this compilation takes literally hours outputting just . for a very long time. When it finally finishes I get the error which can be seen in the output section below. I have done my best to provide information that might be helpful for this. please let me know if there is anything else I can add. The model is VitPose. Thank you for your assistance.
Installation
Conda Environment
Compilation code
output