aws-neuron / aws-neuron-sdk

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
https://aws.amazon.com/machine-learning/neuron/
Other
442 stars 145 forks source link

[INF1] Compilation Error: what(): === BIR error === Reason: Access pattern out of bound. #709

Open JohnRachid opened 1 year ago

JohnRachid commented 1 year ago

Hey everyone,

I'm hoping I can get some help with an error I am facing when compiling a model. I am trying to compile a model for use on INF1 instances. I have replicated this elsewhere, however, this example is for my local environment. One thing to note is this compilation takes literally hours outputting just . for a very long time. When it finally finishes I get the error which can be seen in the output section below. I have done my best to provide information that might be helpful for this. please let me know if there is anything else I can add. The model is VitPose. Thank you for your assistance.

Installation

conda config --add channels conda-forge 

conda create -n inf1 python=3.7.12

pip install environment_kernels

# Set pip repository pointing to the Neuron repository 
python -m pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com

# Update PyTorch Neuron
python -m pip install --upgrade torch-neuron neuron-cc[tensorflow] "protobuf" torchvision

pip install opencv-python==4.5.1.48
pip install timm==0.6.7
pip install av==9.2.0
pip install ffmpeg-python==0.2.0

Conda Environment

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
absl-py                   1.4.0                    pypi_0    pypi
astor                     0.8.1                    pypi_0    pypi
attrs                     23.1.0                   pypi_0    pypi
av                        9.2.0                    pypi_0    pypi
ca-certificates           2023.5.7             hbcca054_0    conda-forge
certifi                   2023.5.7                 pypi_0    pypi
charset-normalizer        3.2.0                    pypi_0    pypi
decorator                 5.1.1                    pypi_0    pypi
dmlc-nnvm                 1.16.0.0+0               pypi_0    pypi
dmlc-topi                 1.16.0.0+0               pypi_0    pypi
dmlc-tvm                  1.16.0.0+0               pypi_0    pypi
environment-kernels       1.2.0                    pypi_0    pypi
exceptiongroup            1.1.2                    pypi_0    pypi
ffmpeg-python             0.2.0                    pypi_0    pypi
future                    0.18.3                   pypi_0    pypi
gast                      0.2.2                    pypi_0    pypi
google-pasta              0.2.0                    pypi_0    pypi
grpcio                    1.56.0                   pypi_0    pypi
h5py                      3.8.0                    pypi_0    pypi
idna                      3.4                      pypi_0    pypi
importlib-metadata        6.7.0                    pypi_0    pypi
inferentia-hwm            1.14.2.0+a9fb5c73a          pypi_0    pypi
iniconfig                 2.0.0                    pypi_0    pypi
islpy                     2022.2.1                 pypi_0    pypi
keras-applications        1.0.8                    pypi_0    pypi
keras-preprocessing       1.1.2                    pypi_0    pypi
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.1.0               he5830b7_0    conda-forge
libgomp                   13.1.0               he5830b7_0    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libsqlite                 3.42.0               h2797004_0    conda-forge
libstdcxx-ng              13.1.0               hfd8a6a1_0    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
markdown                  3.4.3                    pypi_0    pypi
markupsafe                2.1.3                    pypi_0    pypi
ncurses                   6.4                  hcb278e6_0    conda-forge
networkx                  2.6.3                    pypi_0    pypi
neuron-cc                 1.16.2.0+3f0faf100          pypi_0    pypi
numpy                     1.21.6                   pypi_0    pypi
nvidia-cublas-cu11        11.10.3.66               pypi_0    pypi
nvidia-cuda-nvrtc-cu11    11.7.99                  pypi_0    pypi
nvidia-cuda-runtime-cu11  11.7.99                  pypi_0    pypi
nvidia-cudnn-cu11         8.5.0.96                 pypi_0    pypi
opencv-python             4.5.1.48                 pypi_0    pypi
openssl                   3.1.1                hd590300_1    conda-forge
opt-einsum                3.3.0                    pypi_0    pypi
packaging                 23.1                     pypi_0    pypi
pillow                    9.5.0                    pypi_0    pypi
pip                       23.2               pyhd8ed1ab_0    conda-forge
pluggy                    1.2.0                    pypi_0    pypi
protobuf                  3.20.1                   pypi_0    pypi
pytest                    7.4.0                    pypi_0    pypi
python                    3.7.12          hf930737_100_cpython    conda-forge
readline                  8.2                  h8228510_1    conda-forge
requests                  2.31.0                   pypi_0    pypi
scipy                     1.7.3                    pypi_0    pypi
setuptools                68.0.0                   pypi_0    pypi
six                       1.16.0                   pypi_0    pypi
sqlite                    3.42.0               h2c6b66d_0    conda-forge
tensorboard               1.15.0                   pypi_0    pypi
tensorflow                1.15.3                   pypi_0    pypi
tensorflow-estimator      1.15.1                   pypi_0    pypi
termcolor                 2.3.0                    pypi_0    pypi
timm                      0.6.7                    pypi_0    pypi
tk                        8.6.12               h27826a3_0    conda-forge
tomli                     2.0.1                    pypi_0    pypi
torch                     1.13.1                   pypi_0    pypi
torch-neuron              1.13.1.2.7.10.0          pypi_0    pypi
torchvision               0.14.1                   pypi_0    pypi
typing-extensions         4.7.1                    pypi_0    pypi
urllib3                   2.0.4                    pypi_0    pypi
werkzeug                  2.2.3                    pypi_0    pypi
wheel                     0.40.0             pyhd8ed1ab_1    conda-forge
wrapt                     1.15.0                   pypi_0    pypi
xz                        5.2.6                h166bdaf_0    conda-forge
zipp                      3.15.0                   pypi_0    pypi

Compilation code

import numpy as np
# import torch.neuron
import torch.nn as nn
# import torch.neuron
import torch_neuron
import os
import sys
import warnings
import pathlib
import torch
import cv2
# local packages
#gets root
parent_folder = pathlib.Path(__file__).parent.resolve().parent.resolve().parent.resolve().parent.resolve()

sys.path.append(parent_folder.as_posix())

model_path = parent_folder / 'src' / 'REDACTED' / 'models' / 'large_coco.pth'
video_path = parent_folder / 'src' / 'REDACTED' / 'inf1_compiling' / 'vid.mp4'
save_path = parent_folder / 'src' / 'REDACTED' / 'models' / 'inf_compiled_model.pth'

print(model_path)
print(video_path)
print(save_path)

from src.REDACTED.model import build
from src.REDACTED.Video_Class import Input_Video

# Setting up NeuronCore groups for inf1.xlarge
n_cores = 4
os.environ['NEURON_RT_NUM_CORES'] = str(n_cores)
# append yolo director to sys paths

def get_image(mode="aws"):
    """retrieves the first image in video, used in the neuron compiling process.
    inputs: none
    outputs: img a numpy array with shape (1,512,512,3)
    """
    if mode == "aws":
        source = "vid.mp4"
    else:
        source = "vid.mp4"
    imgsz = 512
    stride = 32
    dataset = Input_Video(source)
    img = dataset.get_frame()[1]
    # for path, img, im0s, vid_cap in dataset:
    #
    #     im0 = im0s.copy()
    img_shape = list(img.shape)
    print("shape before")
    print(img_shape)
    img = cv2.resize(img, (256,192))
    img = np.transpose(img, (2, 1, 0))
    print("shape after transpose", img.shape)
    img = torch.from_numpy(img)
    img = img / 255.0  # 0 - 255 to 0.0 - 1.0
    if len(img_shape) == 3:
        img = img[None]  # expand for batch dim
        img_shape = [1] + img_shape
    if img is not None:
        pass
    #

    return img

def test_inference(mode="aws"):
    #constants
    # load the model
    if mode == "aws":
        weights = model_path
    else:
        weights = model_path
    yolo_model = NeuronCompatibilityWrapper(mode)
    img = get_image(mode)
    # pred = yolo_model(img)

    # print("pred shape", pred.shape)

    # return pred

class NeuronCompatibilityWrapper(nn.Module):
    def __init__(self, mode="aws"):
        if mode == "aws":
            # weights = "/tmp/pycharm_project_357/src/YoloV5_DeepSort/best.pt"
            weights = model_path
        else:
            weights = model_path
            # weights = "models/large_coco.pth"
        yolo_model = model = build("large_coco", weights)
        super(NeuronCompatibilityWrapper, self).__init__()
        self.model = yolo_model

    def forward(self, x):
        out = self.model(x)
        # takes only the first output tensor. All that is needed for bbox.
        return out[0]
def compile_and_save_REDACTED(use_jit=False, mode="aws"):
    """compiles the model to run on neuron-optimized inf1 instance
       and saves the compiled model to the repo for quick loading.
        inputs: None
        outputs: None
    """

    # load the model
    model = NeuronCompatibilityWrapper(mode)
    # get test image

    img = get_image(mode='')
    print("toto")
    print(img.shape)
    # run neuron inspection
    model.eval()
    model_neuron = torch_neuron.trace(model,img,strict=False,separate_weights=True)

    # torch.jit.save(model_save_path)
    model_neuron.save(save_path)

if __name__ == "__main__":
    # pred = test_inference(mode="local")
    print(parent_folder)
    compile_and_save_REDACTED(use_jit=False, mode="aws")

output

/home/USERNAME/Desktop/ml_api/USERNAME_ml_pipeline/src/REDACTED/models/large_coco.pth
/home/USERNAME/Desktop/ml_api/USERNAME_ml_pipeline/src/REDACTED/inf1_compiling/vid.mp4
/home/USERNAME/Desktop/ml_api/USERNAME_ml_pipeline/src/REDACTED/models/inf_compiled_model.pth
/home/USERNAME/Desktop/ml_api/USERNAME_ml_pipeline
.shape before
[1080, 1920, 3]
shape after transpose (3, 256, 192)
toto
torch.Size([1, 3, 256, 192])
/home/USERNAME/anaconda3/envs/inf1/lib/python3.7/site-packages/torch/jit/_trace.py:443: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
  outs = wrap_retval(mod(*_clone_inputs(inputs)))
INFO:Neuron:All operators are compiled by neuron-cc (this does not guarantee that neuron-cc will successfully compile)
.INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 747, fused = 747, percent fused = 100.0%
WARNING:tensorflow:From /home/USERNAME/anaconda3/envs/inf1/lib/python3.7/site-packages/torch_neuron/ops/aten.py:2391: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
INFO:Neuron:Compiling function _NeuronGraph$1130 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/USERNAME/anaconda3/envs/inf1/bin/neuron-cc compile /tmp/tmpdn09xudb/model --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpdn09xudb/graph_def.neff --verbose 35'
............................................................................................................................................................................................................................................................................................................................................................................terminate called after throwing an instance of 'std::runtime_error'
  what():  === BIR error ===
Reason: Access pattern out of bound.
Instruction: I-24574-109
Opcode: TensorCopy
Argument AP:
Access Pattern: [[576,64],[1,1],[1,3]]
Offset: 37026
Memory Location: {VitPose_1/ViT_2/PatchEmbed_55/Conv2d_1/aten__convolution/Pad:0_local_13414_i0}@SB<0,93696>(128x2304)#Internal DebugInfo: <VitPose_1/ViT_2/PatchEmbed_55/Conv2d_1/aten__convolution/Pad:0_local_13414||UNDEF||[128, 576, 1]>

WARNING:Neuron:The neuron-cc (neuron compiler) process aborted (SIG_ABORT).  This is likely due to an unexpected condition internally (a bug).  Please lodge an issue at 'https://github.com/aws/aws-neuron-sdk/issues'
WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$1130; falling back to native python function call
ERROR:Neuron:neuron-cc failed with the following command line call:
/home/USERNAME/anaconda3/envs/inf1/bin/neuron-cc compile /tmp/tmpdn09xudb/model --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpdn09xudb/graph_def.neff --verbose 35
Traceback (most recent call last):
  File "/home/USERNAME/anaconda3/envs/inf1/lib/python3.7/site-packages/torch_neuron/convert.py", line 414, in op_converter
    item, inputs, compiler_workdir=sg_workdir, **kwargs)
  File "/home/USERNAME/anaconda3/envs/inf1/lib/python3.7/site-packages/torch_neuron/decorators.py", line 264, in trace
    'neuron-cc failed with the following command line call:\n{}'.format(command))
subprocess.SubprocessError: neuron-cc failed with the following command line call:
/home/USERNAME/anaconda3/envs/inf1/bin/neuron-cc compile /tmp/tmpdn09xudb/model --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpdn09xudb/graph_def.neff --verbose 35
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 747, compiled = 0, percent compiled = 0.0%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 99 [supported]
INFO:Neuron: => aten::_convolution: 4 [supported]
INFO:Neuron: => aten::add: 50 [supported]
INFO:Neuron: => aten::batch_norm: 2 [supported]
INFO:Neuron: => aten::contiguous: 1 [supported]
INFO:Neuron: => aten::dropout: 72 [supported]
INFO:Neuron: => aten::flatten: 1 [supported]
INFO:Neuron: => aten::gelu: 24 [supported]
INFO:Neuron: => aten::layer_norm: 49 [supported]
INFO:Neuron: => aten::linear: 96 [supported]
INFO:Neuron: => aten::matmul: 48 [supported]
INFO:Neuron: => aten::mul: 24 [supported]
INFO:Neuron: => aten::permute: 25 [supported]
INFO:Neuron: => aten::relu_: 2 [supported]
INFO:Neuron: => aten::reshape: 49 [supported]
INFO:Neuron: => aten::select: 73 [supported]
INFO:Neuron: => aten::size: 51 [supported]
INFO:Neuron: => aten::slice: 4 [supported]
INFO:Neuron: => aten::softmax: 24 [supported]
INFO:Neuron: => aten::transpose: 49 [supported]
.INFO:Neuron:All operators are compiled by neuron-cc (this does not guarantee that neuron-cc will successfully compile)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 747, fused = 747, percent fused = 100.0%
.INFO:Neuron:Compiling function _NeuronGraph$2263 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/USERNAME/anaconda3/envs/inf1/bin/neuron-cc compile /tmp/tmpbcm40hzx/model --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpbcm40hzx/graph_def.neff --verbose 35'
...............................................................................................
JohnRachid commented 1 year ago

I see other issues are getting responded to but not this one. Please let me know if there is any further information I can provide. I would love to get us setup on some INF1 instances

micwade-aws commented 1 year ago

@JohnRachid - this one was missed. I'm taking a look now and if I can't figure out what is happening tonight, I'll sync with the team internally to get a better idea/response. Thanks for the patience.

aws-taylor commented 1 year ago

Hello @JohnRachid,

I'm taking a look at this, but unfortunately the compilation code above is too redacted for me to be able to come up with a reproduction. Are you able to share the contents of /tmp/tmpbcm40hzx/model with us? If this is sensitive then you can also reach us at aws-neuron-support@amazon.com.

-Taylor

messmor commented 1 year ago

Hello @aws-taylor,

I am working with @JohnRachid on this issue. Below is a reproducible example of the compiling error. Interestingly, the Attention model will compile if the patch embedding is removed and the patch embedding will compile on its own. However, when combined into a single model we receive the error given below. Furthermore, if the kernel size of the Conv2d is increased from (16,16) to (32,32) similar to issue 398 the combined model will compile. Thanks!

Conda Environment

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
absl-py                   1.4.0                    pypi_0    pypi
astor                     0.8.1                    pypi_0    pypi
attrs                     23.1.0                   pypi_0    pypi
ca-certificates           2023.05.30           h06a4308_0  
certifi                   2023.7.22                pypi_0    pypi
charset-normalizer        3.2.0                    pypi_0    pypi
cycler                    0.11.0                   pypi_0    pypi
decorator                 5.1.1                    pypi_0    pypi
dmlc-nnvm                 1.16.1.0+0               pypi_0    pypi
dmlc-topi                 1.16.1.0+0               pypi_0    pypi
dmlc-tvm                  1.16.1.0+0               pypi_0    pypi
exceptiongroup            1.1.2                    pypi_0    pypi
filelock                  3.12.2                   pypi_0    pypi
fsspec                    2023.1.0                 pypi_0    pypi
gast                      0.2.2                    pypi_0    pypi
google-pasta              0.2.0                    pypi_0    pypi
grpcio                    1.56.2                   pypi_0    pypi
h5py                      3.8.0                    pypi_0    pypi
huggingface-hub           0.16.4                   pypi_0    pypi
idna                      3.4                      pypi_0    pypi
importlib-metadata        6.7.0                    pypi_0    pypi
inferentia-hwm            1.14.4.0+a9fb5c73a          pypi_0    pypi
iniconfig                 2.0.0                    pypi_0    pypi
islpy                     2022.2.1                 pypi_0    pypi
keras-applications        1.0.8                    pypi_0    pypi
keras-preprocessing       1.1.2                    pypi_0    pypi
kiwisolver                1.4.4                    pypi_0    pypi
ld_impl_linux-64          2.38                 h1181459_1  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
libstdcxx-ng              11.2.0               h1234567_1  
markdown                  3.4.4                    pypi_0    pypi
markupsafe                2.1.3                    pypi_0    pypi
matplotlib                3.2.2                    pypi_0    pypi
ncurses                   6.4                  h6a678d5_0  
networkx                  2.6.3                    pypi_0    pypi
neuron-cc                 1.17.0.0+1810fd7ed          pypi_0    pypi
numpy                     1.21.6                   pypi_0    pypi
nvidia-cublas-cu11        11.10.3.66               pypi_0    pypi
nvidia-cuda-nvrtc-cu11    11.7.99                  pypi_0    pypi
nvidia-cuda-runtime-cu11  11.7.99                  pypi_0    pypi
nvidia-cudnn-cu11         8.5.0.96                 pypi_0    pypi
opencv-python             4.5.1.48                 pypi_0    pypi
openssl                   1.1.1u               h7f8727e_0  
opt-einsum                3.3.0                    pypi_0    pypi
packaging                 23.1                     pypi_0    pypi
pandas                    1.3.5                    pypi_0    pypi
pillow                    9.5.0                    pypi_0    pypi
pip                       22.3.1           py37h06a4308_0  
pluggy                    1.2.0                    pypi_0    pypi
protobuf                  3.20.1                   pypi_0    pypi
pyparsing                 3.1.0                    pypi_0    pypi
pytest                    7.4.0                    pypi_0    pypi
python                    3.7.11               h12debd9_0  
python-dateutil           2.8.2                    pypi_0    pypi
pytz                      2023.3                   pypi_0    pypi
pyyaml                    6.0.1                    pypi_0    pypi
readline                  8.2                  h5eee18b_0  
requests                  2.31.0                   pypi_0    pypi
safetensors               0.3.1                    pypi_0    pypi
scipy                     1.7.3                    pypi_0    pypi
seaborn                   0.12.2                   pypi_0    pypi
setuptools                68.0.0                   pypi_0    pypi
six                       1.16.0                   pypi_0    pypi
sqlite                    3.41.2               h5eee18b_0  
tensorboard               1.15.0                   pypi_0    pypi
tensorflow                1.15.3                   pypi_0    pypi
tensorflow-estimator      1.15.1                   pypi_0    pypi
termcolor                 2.3.0                    pypi_0    pypi
timm                      0.9.2                    pypi_0    pypi
tk                        8.6.12               h1ccaba5_0  
tomli                     2.0.1                    pypi_0    pypi
torch                     1.13.1                   pypi_0    pypi
torch-neuron              1.13.1.2.8.9.0           pypi_0    pypi
torchvision               0.14.1                   pypi_0    pypi
tqdm                      4.65.0                   pypi_0    pypi
typing-extensions         4.7.1                    pypi_0    pypi
urllib3                   2.0.4                    pypi_0    pypi
werkzeug                  2.2.3                    pypi_0    pypi
wheel                     0.41.0                   pypi_0    pypi
wrapt                     1.15.0                   pypi_0    pypi
xz                        5.4.2                h5eee18b_0  
zipp                      3.15.0                   pypi_0    pypi
zlib                      1.2.13               h5eee18b_0  

Reproducible Example

import torch
import torch.neuron

class Attention(torch.nn.Module):
    def __init__(
            self, dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0.,
            proj_drop=0., attn_head_dim=None, ):
        super().__init__()
        self.num_heads = num_heads
        self.head_dim = dim // num_heads
        self.dim = dim

        if attn_head_dim is not None:
            self.head_dim = attn_head_dim
        self.all_head_dim = self.head_dim * self.num_heads

        self.scale = self.head_dim ** -0.5

        self.qkv = torch.nn.Linear(dim, self.all_head_dim * 3, bias=qkv_bias)

        self.attn_drop = torch.nn.Dropout(attn_drop)
        self.proj = torch.nn.Linear(self.all_head_dim, dim)
        self.proj_drop = torch.nn.Dropout(proj_drop)

    def forward(self, x):
        B, N, C = x.shape
        qkv = self.qkv(x)
        qkv = qkv.reshape(B, N, 3, self.num_heads, -1).permute(2, 0, 3, 1, 4)
        q, k, v = qkv[0], qkv[1], qkv[2]

        q = q * self.scale
        attn = (q @ k.transpose(-2, -1))

        attn = attn.softmax(dim=-1)
        attn = self.attn_drop(attn)

        x = (attn @ v).transpose(1, 2).reshape(B, N, -1)
        x = self.proj(x)
        x = self.proj_drop(x)

        return x

class Model(torch.nn.Module):
    def __init__(self, dim=1024, num_heads=16, qkv_bias=False, qk_scale=None,
                 proj_drop=0., attn_drop=0., attn_head_dim=None, patch_size=16,
                 in_chans=3, internal_embedding=True
                 ):
        super(Model, self).__init__()
        self.internal_embedding = internal_embedding
        self.attn = Attention(
            dim, num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale,
            attn_drop=attn_drop, proj_drop=proj_drop, attn_head_dim=attn_head_dim
        )
        self.patch_embed = torch.nn.Conv2d(in_chans, dim, kernel_size=(patch_size, patch_size), stride=16, padding=2)

    def forward(self, x):
        x = self.patch_embed(x)
        x = x.flatten(2)
        x = x.transpose(1, 2)
        x = self.attn(x)

        return x

# load model
model = Model()
model.eval()
# get input representing an image of resolution 256x192
model_input = torch.zeros((1, 3, 256, 192))
# test model inference
pred = model(model_input)
# compile model
torch.neuron.trace(model, model_input)

Output

INFO:Neuron:All operators are compiled by neuron-cc (this does not guarantee that neuron-cc will successfully compile)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 25, fused = 25, percent fused = 100.0%
INFO:Neuron:Compiling function _NeuronGraph$26 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/USERNAME/anaconda3/envs/inf1_pose/bin/neuron-cc compile /tmp/tmpk0gh3biu/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpk0gh3biu/graph_def.neff --io-config {"inputs": {"0:0": [[1, 3, 256, 192], "float32"]}, "outputs": ["Attention_9/Linear_53/aten_linear/Add:0"]} --verbose 35'
........WARNING:Neuron:The neuron-cc (neuron compiler) process aborted (SIG_ABORT).  This is likely due to an unexpected condition internally (a bug).  Please lodge an issue at 'https://github.com/aws/aws-neuron-sdk/issues'
WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$26; falling back to native python function call
ERROR:Neuron:neuron-cc failed with the following command line call:
/home/USERNAME/anaconda3/envs/inf1_pose/bin/neuron-cc compile /tmp/tmpk0gh3biu/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpk0gh3biu/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 256, 192], "float32"]}, "outputs": ["Attention_9/Linear_53/aten_linear/Add:0"]}' --verbose 35
Traceback (most recent call last):
  File "/home/USERNAME/anaconda3/envs/inf1_pose/lib/python3.7/site-packages/torch_neuron/convert.py", line 414, in op_converter
    item, inputs, compiler_workdir=sg_workdir, **kwargs)
  File "/home/USERNAME/anaconda3/envs/inf1_pose/lib/python3.7/site-packages/torch_neuron/decorators.py", line 264, in trace
    'neuron-cc failed with the following command line call:\n{}'.format(command))
subprocess.SubprocessError: neuron-cc failed with the following command line call:
/home/USERNAME/anaconda3/envs/inf1_pose/bin/neuron-cc compile /tmp/tmpk0gh3biu/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpk0gh3biu/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 256, 192], "float32"]}, "outputs": ["Attention_9/Linear_53/aten_linear/Add:0"]}' --verbose 35
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 25, compiled = 0, percent compiled = 0.0%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 4 [supported]
INFO:Neuron: => aten::_convolution: 1 [supported]
INFO:Neuron: => aten::dropout: 2 [supported]
INFO:Neuron: => aten::flatten: 1 [supported]
INFO:Neuron: => aten::linear: 2 [supported]
INFO:Neuron: => aten::matmul: 2 [supported]
INFO:Neuron: => aten::mul: 1 [supported]
INFO:Neuron: => aten::permute: 1 [supported]
INFO:Neuron: => aten::reshape: 2 [supported]
INFO:Neuron: => aten::select: 3 [supported]
INFO:Neuron: => aten::size: 2 [supported]
INFO:Neuron: => aten::softmax: 1 [supported]
INFO:Neuron: => aten::transpose: 3 [supported]
Traceback (most recent call last):
  File "/home/USERNAME/engineering/USERNAME_ml_pipeline/src/Pose/inf1_compiling/Reproducible_Example.py", line 75, in <module>
    torch.neuron.trace(model, model_input)
  File "/home/USERNAME/anaconda3/envs/inf1_pose/lib/python3.7/site-packages/torch_neuron/convert.py", line 217, in trace
    cu.stats_post_compiler(neuron_graph)
  File "/home/USERNAME/anaconda3/envs/inf1_pose/lib/python3.7/site-packages/torch_neuron/convert.py", line 531, in stats_post_compiler
    "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!")
RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace!

Process finished with exit code 1
messmor commented 1 year ago

Any updates on this? Were you able to reproduce the error with this new example?

jluntamazon commented 1 year ago

We have reproduced the issue and have implemented a fix. The fix for this issue will be available in an upcoming release. We will update this ticket when the fix is available.

messmor commented 1 year ago

Any update on when the fix will be implemented? I have tested on the newest release (neuron 2.13.2 released 9/1/23) and the bug still exists. The minimal reproducible example I gave above still fails.

aws-donkrets commented 11 months ago

Hi messmor - The previous intended fix needs more work; sorry but we cannot commit to an ETA at this time.

JohnRachid commented 11 months ago

This is very unfortunate. Looks like we will need to evaluate alternatives to these instances.