KevinZhao commented 1 year ago

My code as below:

import torch from PIL import Image import requests import torch_neuronx from transformers import SamModel, SamProcessor

device = "cuda" if torch.cuda.is_available() else "cpu"

model = SamModel.from_pretrained("facebook/sam-vit-huge")#.to(device) processor = SamProcessor.from_pretrained("facebook/sam-vit-huge")

img_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png" raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB") input_points = [[[450, 600]]] # 2D location of a window in the image

inputs = processor(raw_image, input_points=input_points, return_tensors="pt")#.to(device)

outputs = model(**inputs)

example = (inputs['pixel_values'],)

Compile the model

COMPILER_WORKDIR_ROOT = 'compile_dir' model_neuron = torch_neuronx.trace(model, example, compiler_args="--model-type=transformer", compiler_workdir = COMPILER_WORKDIR_ROOT)

Save the TorchScript for inference deployment

filename = 'SAM.pt' torch.jit.save(model_neuron, filename)

I got errer: Too many instructions after unroll for function sg0000 !

File ~/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch_neuronx/xla_impl/trace.py:281, in hlo_compile(filename, compiler_workdir, compiler_args) 274 elif status == -11: 275 logger.warning( 276 "The neuronx-cc (neuron compiler) crashed (SEGFAULT). " 277 "This is likely due to a bug in the compiler. " 278 "Please lodge an issue at 'https://github.com/aws/aws-neuron-sdk/issues'" 279 ) --> 281 raise RuntimeError(f"neuronx-cc failed with {status}") 283 return neff_filename

RuntimeError: neuronx-cc failed with 70

aws-rhsoln commented 1 year ago

Thank you fore reporting the issue. We are able to reproduce the issue and are working on a fix

aws-donkrets commented 11 months ago

Hi KevinZhao, We are still working on a long-term fix for your reported issue. In the interim, perhaps you can try passing the "-O1" flag to the compiler. This may result in a different type of graph sharding that could be a work-around for your issue.

KevinZhao commented 11 months ago

do you mean adding args as below?

COMPILER_WORKDIR_ROOT = 'compile_dir' model_neuron = torch_neuronx.trace( model, example, compiler_args=["--model-type=transformer", "-O1"], compiler_workdir = COMPILER_WORKDIR_ROOT)

I also tried compiler_args=["--model-type=transformer", "--optlevel=1"],

with no luck

aws-donkrets commented 11 months ago

Hi KevinZhao, Yes, that is what I meant (--optlevel is the alias for -O). Sorry to hear it didn't work but it was an attempt to provide a temporary workaround. W need to triage further.

Mithil157 commented 5 months ago

Hi @aws-donkrets was this issue fixed, currently I am trying to compile an SD2 in-painting model using inf2.8xlarge on Amazon linux 2 AMI with neuronx-cc==2.13.72.0 torch-neuronx==1.13.1.1.14.0 but getting the same error :

cszhz commented 3 months ago

any progress for this issue?

maher1337 commented 3 months ago

@aws-donkrets any updates?

aws-donkrets commented 2 months ago

I'm checking with the engineer working on our SD model support.

brunodoamaral commented 2 weeks ago

I was doing a benchmark on torch_neuronx.trace vs torch.jit.trace for different batch sizes and it stops at BS=256

Used AMI Deep Learning AMI Neuron PyTorch 2.1 (Ubuntu 22.04) 20240723 on a inf2.xlarge

Here is the code:

print('Begin imports...')
import torch
import torch.nn as nn
import torch_neuronx
import torch.jit
import timeit
import pandas as pd

# Definir um modelo simples
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1)
        self.fc1 = nn.Linear(32 * 56 * 56, 128)
        self.fc2 = nn.Linear(128, 10)  # Supondo 10 classes de saída

    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = torch.max_pool2d(x, 2)
        x = torch.relu(self.conv2(x))
        x = torch.max_pool2d(x, 2)
        x = x.view(x.size(0), -1)
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

print('Instantiaint model...')
# Instanciar o modelo
model = SimpleCNN()

# Batch sizes para teste
batch_sizes = [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]
results = []

# Função para medir o tempo de inferência
def benchmark_inference(model, input_tensor):
    with torch.no_grad():
        model(input_tensor)

print('Begin benchmark!')
for batch_size in batch_sizes:
    print('Benchmark BS {}'.format(batch_size))

    # Definir um exemplo de entrada
    example_input = torch.randn(batch_size, 3, 224, 224)  # Ajuste conforme o seu modelo

    print('Tracing with neuronx...')
    # Traçar o modelo com Neuron
    neuronx_model = torch_neuronx.trace(model, example_input)

    # Realizar a inferência com example_input para aquecer
    with torch.no_grad():
        neuronx_model(example_input)

    print('Tracing with jit...')
    # Traçar o modelo com jit
    jit_model = torch.jit.trace(model, example_input)

    # Realizar a inferência com example_input para aquecer
    with torch.no_grad():
        jit_model(example_input)

    print('Benchmarking neuronx...')
    # Medir tempo de inferência para neuronx_model
    time_neuronx_model = timeit.Timer(
        stmt='benchmark_inference(neuronx_model, example_input)',
        globals=globals()
    ).timeit(100) / 100

    print('Benchmarking jit...')
    # Medir tempo de inferência para model
    time_model = timeit.Timer(
        stmt='benchmark_inference(jit_model, example_input)',
        globals=globals()
    ).timeit(100) / 100

    # Adicionar resultados ao dataframe
    results.append({
        'batch_size': batch_size,
        'neuronx_model_time': time_neuronx_model,
        'model_time': time_model
    })

    print(results[-1])

# Convertendo para DataFrame e salvando em CSV
df = pd.DataFrame(results)
print(df)
df.to_csv('benchmark.csv', index=False)

aws-neuron / aws-neuron-sdk

Try to compile Segment Anything and found "The neuronx-cc (neuron compiler) crashed (SEGFAULT)." #751

device = "cuda" if torch.cuda.is_available() else "cpu"

outputs = model(**inputs)

Compile the model

Save the TorchScript for inference deployment