Open Suprhimp opened 4 months ago
Hi Suprhimp, took a quick look at your code and it seems to be correct.
The torch_neuronx.trace
call can pass compiler options and the way you have done it looks correct as does your command definition. I'll note you don't need to use the os.environ["NEURON_CC_FLAGS"] = "-O1"
line so that can be removed. One suggestion is to move the neff_filename
parameter to the end of the command
setting, allowing all the cmd-line flags to appear before the filename. So, the cmd-line would look like:
neuronx_cc compile input_file_name --framework XLA --target trn1 --optlevel 1 --output neff_filename
Another suggestion would be to run the above command by hand to see if you get the same result.
Hi, Thanks for checking my issue @aws-donkrets :)
even if I change the code like this
in trace.py
function name hlo_compile
like this
if neuron_cc is None:
raise RuntimeError("neuronx-cc compiler binary does not exist")
command = [
neuron_cc,
"compile",
filename,
"--framework",
"XLA",
"--target",
"trn1",
"--optlevel",
"1",
"--output",
neff_filename,
]
command.extend(compiler_args)
it gives me this log
2024-02-29T02:01:25Z Compilation is optimized for best performance and compilation time. For faster compilation time please use -O1
and also I faild my compile ;)
@aws-donkrets hello, let me add question, Is there any way to compile .pth
file to run my torch file in inf2
instance?
faster compile flag still not work.
Can you check it please?
Hello @Suprhimp,
We do not directly support compiling .pth
files, you would need to load it first, perhaps using load_state_dict()
, then trace the loaded model to trigger compilation.
Could you share your model or more of the failure logs from the compiler (usually log-neuronx-cc.txt)? That will give us more of an idea of why the failure is occurring.
My environment is aws server
inf2.8xlarge
python : 3.8.10 torch-neuronx : 2.1.1.2.0.1b0 neuronx-cc : 2.12.68.0+4480452af
I'm trying to compile esrgan torch model to neuron but I have an issue.
when I run this code first it gives me this log
I want to compile with
-O1
because of this error log (yes, i failed compile)I can't set the optlevel flag to 1 ... even I changed inside the module code like this
what should I do if I want to compile with --optlevel=1 with
torch_neuronx.trace
?