aws-neuron / aws-neuron-sdk

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
445 stars 148 forks source link

Try to compile Segment Anything and found "The neuronx-cc (neuron compiler) crashed (SEGFAULT)." #751

Open KevinZhao opened 1 year ago

KevinZhao commented 1 year ago

My code as below:

import torch from PIL import Image import requests import torch_neuronx from transformers import SamModel, SamProcessor

device = "cuda" if torch.cuda.is_available() else "cpu"

model = SamModel.from_pretrained("facebook/sam-vit-huge") processor = SamProcessor.from_pretrained("facebook/sam-vit-huge")

img_url = "" raw_image =, stream=True).raw).convert("RGB") input_points = [[[450, 600]]] # 2D location of a window in the image

inputs = processor(raw_image, input_points=input_points, return_tensors="pt")

outputs = model(**inputs)

example = (inputs['pixel_values'],)

Compile the model

COMPILER_WORKDIR_ROOT = 'compile_dir' model_neuron = torch_neuronx.trace(model, example, compiler_args="--model-type=transformer", compiler_workdir = COMPILER_WORKDIR_ROOT)

Save the TorchScript for inference deployment

filename = '', filename)

I got errer: Too many instructions after unroll for function sg0000 !

File ~/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch_neuronx/xla_impl/, in hlo_compile(filename, compiler_workdir, compiler_args) 274 elif status == -11: 275 logger.warning( 276 "The neuronx-cc (neuron compiler) crashed (SEGFAULT). " 277 "This is likely due to a bug in the compiler. " 278 "Please lodge an issue at ''" 279 ) --> 281 raise RuntimeError(f"neuronx-cc failed with {status}") 283 return neff_filename

RuntimeError: neuronx-cc failed with 70

aws-rhsoln commented 1 year ago

Thank you fore reporting the issue. We are able to reproduce the issue and are working on a fix

aws-donkrets commented 11 months ago

Hi KevinZhao, We are still working on a long-term fix for your reported issue. In the interim, perhaps you can try passing the "-O1" flag to the compiler. This may result in a different type of graph sharding that could be a work-around for your issue.

KevinZhao commented 11 months ago

do you mean adding args as below?

COMPILER_WORKDIR_ROOT = 'compile_dir' model_neuron = torch_neuronx.trace( model, example, compiler_args=["--model-type=transformer", "-O1"], compiler_workdir = COMPILER_WORKDIR_ROOT)

I also tried compiler_args=["--model-type=transformer", "--optlevel=1"],

with no luck

aws-donkrets commented 11 months ago

Hi KevinZhao, Yes, that is what I meant (--optlevel is the alias for -O). Sorry to hear it didn't work but it was an attempt to provide a temporary workaround. W need to triage further.

Mithil157 commented 5 months ago

Hi @aws-donkrets was this issue fixed, currently I am trying to compile an SD2 in-painting model using inf2.8xlarge on Amazon linux 2 AMI with neuronx-cc== torch-neuronx== but getting the same error :

Screenshot 2024-04-29 at 11 26 11 PM
cszhz commented 3 months ago

any progress for this issue?

maher1337 commented 3 months ago

@aws-donkrets any updates?

aws-donkrets commented 2 months ago

I'm checking with the engineer working on our SD model support.

brunodoamaral commented 2 weeks ago

I was doing a benchmark on torch_neuronx.trace vs torch.jit.trace for different batch sizes and it stops at BS=256

Used AMI Deep Learning AMI Neuron PyTorch 2.1 (Ubuntu 22.04) 20240723 on a inf2.xlarge

Here is the code:

print('Begin imports...')
import torch
import torch.nn as nn
import torch_neuronx
import torch.jit
import timeit
import pandas as pd

# Definir um modelo simples
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1)
        self.fc1 = nn.Linear(32 * 56 * 56, 128)
        self.fc2 = nn.Linear(128, 10)  # Supondo 10 classes de saída

    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = torch.max_pool2d(x, 2)
        x = torch.relu(self.conv2(x))
        x = torch.max_pool2d(x, 2)
        x = x.view(x.size(0), -1)
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

print('Instantiaint model...')
# Instanciar o modelo
model = SimpleCNN()

# Batch sizes para teste
batch_sizes = [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]
results = []

# Função para medir o tempo de inferência
def benchmark_inference(model, input_tensor):
    with torch.no_grad():

print('Begin benchmark!')
for batch_size in batch_sizes:
    print('Benchmark BS {}'.format(batch_size))

    # Definir um exemplo de entrada
    example_input = torch.randn(batch_size, 3, 224, 224)  # Ajuste conforme o seu modelo

    print('Tracing with neuronx...')
    # Traçar o modelo com Neuron
    neuronx_model = torch_neuronx.trace(model, example_input)

    # Realizar a inferência com example_input para aquecer
    with torch.no_grad():

    print('Tracing with jit...')
    # Traçar o modelo com jit
    jit_model = torch.jit.trace(model, example_input)

    # Realizar a inferência com example_input para aquecer
    with torch.no_grad():

    print('Benchmarking neuronx...')
    # Medir tempo de inferência para neuronx_model
    time_neuronx_model = timeit.Timer(
        stmt='benchmark_inference(neuronx_model, example_input)',
    ).timeit(100) / 100

    print('Benchmarking jit...')
    # Medir tempo de inferência para model
    time_model = timeit.Timer(
        stmt='benchmark_inference(jit_model, example_input)',
    ).timeit(100) / 100

    # Adicionar resultados ao dataframe
        'batch_size': batch_size,
        'neuronx_model_time': time_neuronx_model,
        'model_time': time_model


# Convertendo para DataFrame e salvando em CSV
df = pd.DataFrame(results)
df.to_csv('benchmark.csv', index=False)