alibaba / BladeDISC

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
Apache License 2.0
810 stars 160 forks source link

Could not export Python function call '_ReduceFromModelParallelRegion'. during torch.jit.save #163

Open jues opened 2 years ago

jues commented 2 years ago

Hi, below is my function from Megatron-LM

class _ReduceFromModelParallelRegion(torch.autograd.Function):
    """All-redcue the input from the model parallel region."""

    @staticmethod
    def forward(ctx, input_):
        return _reduce(input_)

    @staticmethod
    def backward(ctx, grad_output):
        return grad_output

def reduce_from_model_parallel_region(input_):
    return _ReduceFromModelParallelRegion.apply(input_)

And the exception occurred during torch.jit.save:

  File "test.py", line 605, in generate_sentence
    torch.jit.save(optimized_ts, "opt.disc.pt")
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/_serialization.py", line 81, in save
    m.save(f, _extra_files=_extra_files)
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/_script.py", line 487, in save
    return self._c.save(*args, **kwargs)
RuntimeError: 
Could not export Python function call '_ReduceFromModelParallelRegion'. Remove calls to Python functions before export. Did you forget to add @script or @script_method annotation? If this is a nn.ModuleList, add it to __constants__:
/work/generate/gpt2/mpu/mappings.py(135): reduce_from_model_parallel_region
/work/generate/gpt2/mpu/layers.py(134): forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(709): _slow_forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(725): _call_impl
/work/generate/gpt2/model/gpt2_modeling.py(83): forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(709): _slow_forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(725): _call_impl
/work/generate/gpt2/fp16/fp16.py(65): forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(709): _slow_forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(725): _call_impl
/usr/local/lib/python3.6/dist-packages/torch/jit/_trace.py(940): trace_module
/usr/local/lib/python3.6/dist-packages/torch/jit/_trace.py(742): trace
/usr/local/lib/python3.6/dist-packages/torch_blade/exporter.py(235): export
/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py(26): decorate_context
/usr/local/lib/python3.6/dist-packages/torch_blade/optimization.py(37): _optimize
/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py(26): decorate_context
/usr/local/lib/python3.6/dist-packages/torch_blade/optimization.py(111): optimize
/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py(26): decorate_context
test.py(602): generate_sentence

How to save the model with torch.autograd.Function ? Thank you very much !

tanyokwok commented 2 years ago

torch_blade.optimize would try to export the nn.Module to TorchScript. But, a torch.autograd.Function is not supported by TorchScript. Perhaps you could register a TorchScript custom operator for the function.