Open KevinZhao opened 1 year ago
Thank you fore reporting the issue. We are able to reproduce the issue and are working on a fix
Hi KevinZhao, We are still working on a long-term fix for your reported issue. In the interim, perhaps you can try passing the "-O1" flag to the compiler. This may result in a different type of graph sharding that could be a work-around for your issue.
do you mean adding args as below?
COMPILER_WORKDIR_ROOT = 'compile_dir' model_neuron = torch_neuronx.trace( model, example, compiler_args=["--model-type=transformer", "-O1"], compiler_workdir = COMPILER_WORKDIR_ROOT)
I also tried compiler_args=["--model-type=transformer", "--optlevel=1"],
with no luck
Hi KevinZhao, Yes, that is what I meant (--optlevel is the alias for -O). Sorry to hear it didn't work but it was an attempt to provide a temporary workaround. W need to triage further.
Hi @aws-donkrets was this issue fixed, currently I am trying to compile an SD2 in-painting model using inf2.8xlarge on Amazon linux 2 AMI with neuronx-cc==2.13.72.0 torch-neuronx==1.13.1.1.14.0 but getting the same error :
any progress for this issue?
@aws-donkrets any updates?
I'm checking with the engineer working on our SD model support.
I was doing a benchmark on torch_neuronx.trace
vs torch.jit.trace
for different batch sizes and it stops at BS=256
Used AMI Deep Learning AMI Neuron PyTorch 2.1 (Ubuntu 22.04) 20240723 on a inf2.xlarge
Here is the code:
print('Begin imports...')
import torch
import torch.nn as nn
import torch_neuronx
import torch.jit
import timeit
import pandas as pd
# Definir um modelo simples
class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1)
self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1)
self.fc1 = nn.Linear(32 * 56 * 56, 128)
self.fc2 = nn.Linear(128, 10) # Supondo 10 classes de saída
def forward(self, x):
x = torch.relu(self.conv1(x))
x = torch.max_pool2d(x, 2)
x = torch.relu(self.conv2(x))
x = torch.max_pool2d(x, 2)
x = x.view(x.size(0), -1)
x = torch.relu(self.fc1(x))
x = self.fc2(x)
return x
print('Instantiaint model...')
# Instanciar o modelo
model = SimpleCNN()
# Batch sizes para teste
batch_sizes = [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]
results = []
# Função para medir o tempo de inferência
def benchmark_inference(model, input_tensor):
with torch.no_grad():
model(input_tensor)
print('Begin benchmark!')
for batch_size in batch_sizes:
print('Benchmark BS {}'.format(batch_size))
# Definir um exemplo de entrada
example_input = torch.randn(batch_size, 3, 224, 224) # Ajuste conforme o seu modelo
print('Tracing with neuronx...')
# Traçar o modelo com Neuron
neuronx_model = torch_neuronx.trace(model, example_input)
# Realizar a inferência com example_input para aquecer
with torch.no_grad():
neuronx_model(example_input)
print('Tracing with jit...')
# Traçar o modelo com jit
jit_model = torch.jit.trace(model, example_input)
# Realizar a inferência com example_input para aquecer
with torch.no_grad():
jit_model(example_input)
print('Benchmarking neuronx...')
# Medir tempo de inferência para neuronx_model
time_neuronx_model = timeit.Timer(
stmt='benchmark_inference(neuronx_model, example_input)',
globals=globals()
).timeit(100) / 100
print('Benchmarking jit...')
# Medir tempo de inferência para model
time_model = timeit.Timer(
stmt='benchmark_inference(jit_model, example_input)',
globals=globals()
).timeit(100) / 100
# Adicionar resultados ao dataframe
results.append({
'batch_size': batch_size,
'neuronx_model_time': time_neuronx_model,
'model_time': time_model
})
print(results[-1])
# Convertendo para DataFrame e salvando em CSV
df = pd.DataFrame(results)
print(df)
df.to_csv('benchmark.csv', index=False)
My code as below:
import torch from PIL import Image import requests import torch_neuronx from transformers import SamModel, SamProcessor
device = "cuda" if torch.cuda.is_available() else "cpu"
model = SamModel.from_pretrained("facebook/sam-vit-huge")#.to(device) processor = SamProcessor.from_pretrained("facebook/sam-vit-huge")
img_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png" raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB") input_points = [[[450, 600]]] # 2D location of a window in the image
inputs = processor(raw_image, input_points=input_points, return_tensors="pt")#.to(device)
outputs = model(**inputs)
example = (inputs['pixel_values'],)
Compile the model
COMPILER_WORKDIR_ROOT = 'compile_dir' model_neuron = torch_neuronx.trace(model, example, compiler_args="--model-type=transformer", compiler_workdir = COMPILER_WORKDIR_ROOT)
Save the TorchScript for inference deployment
filename = 'SAM.pt' torch.jit.save(model_neuron, filename)
I got errer: Too many instructions after unroll for function sg0000 !
File ~/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch_neuronx/xla_impl/trace.py:281, in hlo_compile(filename, compiler_workdir, compiler_args) 274 elif status == -11: 275 logger.warning( 276 "The neuronx-cc (neuron compiler) crashed (SEGFAULT). " 277 "This is likely due to a bug in the compiler. " 278 "Please lodge an issue at 'https://github.com/aws/aws-neuron-sdk/issues'" 279 ) --> 281 raise RuntimeError(f"neuronx-cc failed with {status}") 283 return neff_filename
RuntimeError: neuronx-cc failed with 70