aws-neuron / aws-neuron-samples

Example code for AWS Neuron SDK developers building inference and training applications
Other
101 stars 32 forks source link

I can't set params `optlevel` to `1` with torch_neuronx.trace #67

Open Suprhimp opened 4 months ago

Suprhimp commented 4 months ago

My environment is aws server inf2.8xlarge

python : 3.8.10 torch-neuronx : 2.1.1.2.0.1b0 neuronx-cc : 2.12.68.0+4480452af

I'm trying to compile esrgan torch model to neuron but I have an issue.

from PIL import Image
import requests

import torch
import torch_neuronx
from torchvision import models
from torchvision.transforms import functional

from modules.esrgan_upscale import upscale_model_loader
import os
os.environ["NEURON_CC_FLAGS"] = "-O1"
# load the model
model = upscale_model_loader('modules/weight/4x-Ultrasharp.pth')
model.eval()

# Get an example input
image = Image.open('/home/ubuntu/diffusers-ultimate-upscale/testIm.png')
image = image.convert('RGB')
image = functional.to_tensor(image)
image = torch.unsqueeze(image, 0)

# Run inference on CPU
output_cpu = model(image)

# Compile the model
model_neuron = torch_neuronx.trace(model, image,compiler_args=['--optlevel','1'])

# Save the TorchScript for inference deployment
filename = 'model.pt'
torch.jit.save(model_neuron, filename)

when I run this code first it gives me this log

2024-02-20T13:36:54Z Compilation is optimized for best performance and compilation time. For faster compilation time please use -O1

I want to compile with -O1 because of this error log (yes, i failed compile)

[XTP002] Too many instructions after unroll for function sg0000! - Compiling under --optlevel=1 may result in smaller graphs. If you are using a transformer model, try using a smaller context_length_estimate value.

I can't set the optlevel flag to 1 ... even I changed inside the module code like this

    command = [
        neuron_cc,
        "compile",
        filename,
        "--framework",
        "XLA",
        "--target",
        "trn1",
        "--output",
        neff_filename,
        "--optlevel",
        "1"
    ]
    command.extend(compiler_args)

what should I do if I want to compile with --optlevel=1 with torch_neuronx.trace ?

aws-donkrets commented 4 months ago

Hi Suprhimp, took a quick look at your code and it seems to be correct. The torch_neuronx.trace call can pass compiler options and the way you have done it looks correct as does your command definition. I'll note you don't need to use the os.environ["NEURON_CC_FLAGS"] = "-O1" line so that can be removed. One suggestion is to move the neff_filename parameter to the end of the command setting, allowing all the cmd-line flags to appear before the filename. So, the cmd-line would look like: neuronx_cc compile input_file_name --framework XLA --target trn1 --optlevel 1 --output neff_filename

Another suggestion would be to run the above command by hand to see if you get the same result.

Suprhimp commented 4 months ago

Hi, Thanks for checking my issue @aws-donkrets :)

even if I change the code like this in trace.py function name hlo_compile like this

if neuron_cc is None:
        raise RuntimeError("neuronx-cc compiler binary does not exist")
    command = [
        neuron_cc,
        "compile",
        filename,
        "--framework",
        "XLA",
        "--target",
        "trn1",
        "--optlevel",
        "1",
        "--output",
        neff_filename,
    ]
    command.extend(compiler_args)

it gives me this log

2024-02-29T02:01:25Z Compilation is optimized for best performance and compilation time. For faster compilation time please use -O1

and also I faild my compile ;)

Suprhimp commented 3 months ago

@aws-donkrets hello, let me add question, Is there any way to compile .pth file to run my torch file in inf2 instance?

faster compile flag still not work.

Can you check it please?

aws-taylor commented 1 month ago

Hello @Suprhimp,

We do not directly support compiling .pth files, you would need to load it first, perhaps using load_state_dict(), then trace the loaded model to trigger compilation.

Could you share your model or more of the failure logs from the compiler (usually log-neuronx-cc.txt)? That will give us more of an idea of why the failure is occurring.