aws-neuron / neuronx-distributed

MIT No Attribution
30 stars 5 forks source link

torch_neuronx.xla_impl.trace._trace Inconsistent with the latest torch_neuronx-2.1.1.2.0.0b0-py3-none-any.whl #15

Closed yinsong1986 closed 1 month ago

yinsong1986 commented 4 months ago

This function call torch_neuronx.xla_impl.trace._trace https://github.com/aws-neuron/neuronx-distributed/blob/main/src/neuronx_distributed/trace/trace.py#L103 seems to to be inconsistent to the function return defined on torch_neuronx-2.1.1.2.0.0b0. The function torch_neuronx.xla_impl.trace._trace defined there only return 4 values (but there are 5 return values https://github.com/aws-neuron/neuronx-distributed/blob/main/src/neuronx_distributed/trace/trace.py#L103). This will cause the example here https://github.com/aws-neuron/neuronx-distributed/blob/main/examples/inference/runner.py not working. Please investigate and thank you!

jyang-aws commented 4 months ago

@yinsong1986 I confirmed the issue, thanks for reporting it. The returned args return neff_filename, metaneff.SerializeToString(), flattener, packer misses the weights parameter. We're working on a fix

yinsong1986 commented 4 months ago

@jyang-aws Any timeline on this issue? Thanks!

aws-rhsoln commented 4 months ago

We added the fix and should have it in our upcoming release

aws-taylor commented 1 month ago

Hello @yinsong1986,

We have shipped our 2.18 release, which we believe fixes this issue. https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/index.html