aws-neuron / aws-neuron-sdk

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
https://aws.amazon.com/machine-learning/neuron/
Other
423 stars 136 forks source link

[torch-neuronx] Resnet50 Traning issue on Trn1 (Error: transpose not supported) #588

Closed BugFreeee closed 1 year ago

BugFreeee commented 1 year ago

So I intend to enable resnet50 on trn1. CPU run passed. But when XLA is enabled, a compilation transpose op error occurs. Thought the transpose op is supported according to the doc?

I have been trying to modify from the MLP example in neuronx official document:https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/training/mlp.html I was also using torchvision and very little changes were makde to the original codes. The complete codes and error messages are as below:

BugFreeee commented 1 year ago
import os
import time
import torch
#from model import MLP

from torchvision.datasets import mnist
import torchvision
from torchvision import transforms
from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor
from torchvision.datasets import CIFAR10
# XLA imports
import torch_xla.core.xla_model as xm

# Global constants
EPOCHS = 1
WARMUP_STEPS = 2
BATCH_SIZE = 32

# Load MNIST train dataset
#train_dataset = mnist.MNIST(root='./MNIST_DATA_train',
#                            train=True, download=True, transform=ToTensor())
train_dataset = CIFAR10(root='./cifar_DATA_train',
                             download=True, transform=transforms.Compose([transforms.Resize([224, 224]), transforms.ToTensor()]))

def main():
    # Prepare data loader
    train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE)

    # Fix the random number generator seeds for reproducibility
    torch.manual_seed(0)

    # XLA: Specify XLA device (defaults to a NeuronCore on Trn1 instance)
    device = 'xla'

    # Move model to device and declare optimizer and loss function
    #model = MLP().to(device)
    model = torchvision.models.resnet50(pretrained=True)
    model.to(device)
    optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
    loss_fn = torch.nn.NLLLoss()

    # Run the training loop
    print('----------Training ---------------')
    model.train()
    for epoch in range(EPOCHS):
        start = time.time()
        for idx, (train_x, train_label) in enumerate(train_loader):
            optimizer.zero_grad()
            #train_x = train_x.view(train_x.size(0), -1)
            train_x = train_x.to(device)
            train_label = train_label.to(device)
            output = model(train_x)
            loss = loss_fn(output, train_label)
            loss.backward()
            optimizer.step()
            xm.mark_step() # XLA: collect ops and run them in XLA runtime
            if idx < WARMUP_STEPS: # skip warmup iterations
                start = time.time()
            if idx > 5:
                break

    # Compute statistics for the last epoch
    interval = idx - WARMUP_STEPS # skip warmup iterations
    throughput = interval / (time.time() - start)
    print("Train throughput (iter/sec): {}".format(throughput))
    print("Final loss is {:0.4f}".format(loss.detach().to('cpu')))

    # Save checkpoint for evaluation
    os.makedirs("checkpoints", exist_ok=True)
    checkpoint = {'state_dict': model.state_dict()}
    # XLA: use xm.save instead of torch.save to ensure states are moved back to cpu
    # This can prevent "XRT memory handle not found" at end of test.py execution
    xm.save(checkpoint,'checkpoints/checkpoint.pt')

    print('----------End Training ---------------')

if __name__ == '__main__':
    main()
BugFreeee commented 1 year ago
(rn50) ubuntu@ip-172-31-51-xx:~/xxx/rn50/aws-neuron-samples/torch-neuronx/training/mnist_mlp$ python train_rn50.py
Files already downloaded and verified
----------Training ---------------
2022-11-04 08:46:56.181901: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at tpu_execute_op.cc:266 : UNIMPLEMENTED:
The following HLO instructions are not supported by neuronx-cc:
================================================================================

%reverse.2399 = f32[1,1,512,2048] reverse(f32[1,1,512,2048] %transpose.2398), dimensions={0,1}
%reverse.2427 = f32[3,3,512,512] reverse(f32[3,3,512,512] %transpose.2426), dimensions={0,1}
%reverse.2455 = f32[1,1,2048,512] reverse(f32[1,1,2048,512] %transpose.2454), dimensions={0,1}
%reverse.2485 = f32[1,1,512,2048] reverse(f32[1,1,512,2048] %transpose.2484), dimensions={0,1}
%reverse.2513 = f32[3,3,512,512] reverse(f32[3,3,512,512] %transpose.2512), dimensions={0,1}
%reverse.2541 = f32[1,1,2048,512] reverse(f32[1,1,2048,512] %transpose.2540), dimensions={0,1}
%reverse.2571 = f32[1,1,512,2048] reverse(f32[1,1,512,2048] %transpose.2570), dimensions={0,1}
%reverse.2599 = f32[3,3,512,512] reverse(f32[3,3,512,512] %transpose.2598), dimensions={0,1}
%reverse.2627 = f32[1,1,1024,512] reverse(f32[1,1,1024,512] %transpose.2626), dimensions={0,1}
%reverse.2650 = f32[1,1,1024,2048] reverse(f32[1,1,1024,2048] %transpose.2649), dimensions={0,1}
%reverse.2679 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.2678), dimensions={0,1}
%reverse.2707 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.2706), dimensions={0,1}
%reverse.2735 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.2734), dimensions={0,1}
%reverse.2765 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.2764), dimensions={0,1}
%reverse.2793 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.2792), dimensions={0,1}
%reverse.2821 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.2820), dimensions={0,1}
%reverse.2851 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.2850), dimensions={0,1}
%reverse.2879 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.2878), dimensions={0,1}
%reverse.2907 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.2906), dimensions={0,1}
%reverse.2937 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.2936), dimensions={0,1}
%reverse.2965 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.2964), dimensions={0,1}
%reverse.2993 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.2992), dimensions={0,1}
%reverse.3023 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.3022), dimensions={0,1}
%reverse.3051 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.3050), dimensions={0,1}
%reverse.3079 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.3078), dimensions={0,1}
%reverse.3109 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.3108), dimensions={0,1}
%reverse.3137 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.3136), dimensions={0,1}
%reverse.3165 = f32[1,1,512,256] reverse(f32[1,1,512,256] %transpose.3164), dimensions={0,1}
%reverse.3188 = f32[1,1,512,1024] reverse(f32[1,1,512,1024] %transpose.3187), dimensions={0,1}
%reverse.3217 = f32[1,1,128,512] reverse(f32[1,1,128,512] %transpose.3216), dimensions={0,1}
%reverse.3245 = f32[3,3,128,128] reverse(f32[3,3,128,128] %transpose.3244), dimensions={0,1}
%reverse.3273 = f32[1,1,512,128] reverse(f32[1,1,512,128] %transpose.3272), dimensions={0,1}
%reverse.3303 = f32[1,1,128,512] reverse(f32[1,1,128,512] %transpose.3302), dimensions={0,1}
%reverse.3331 = f32[3,3,128,128] reverse(f32[3,3,128,128] %transpose.3330), dimensions={0,1}
%reverse.3359 = f32[1,1,512,128] reverse(f32[1,1,512,128] %transpose.3358), dimensions={0,1}
%reverse.3389 = f32[1,1,128,512] reverse(f32[1,1,128,512] %transpose.3388), dimensions={0,1}
%reverse.3417 = f32[3,3,128,128] reverse(f32[3,3,128,128] %transpose.3416), dimensions={0,1}
%reverse.3445 = f32[1,1,512,128] reverse(f32[1,1,512,128] %transpose.3444), dimensions={0,1}
%reverse.3475 = f32[1,1,128,512] reverse(f32[1,1,128,512] %transpose.3474), dimensions={0,1}
%reverse.3503 = f32[3,3,128,128] reverse(f32[3,3,128,128] %transpose.3502), dimensions={0,1}
%reverse.3531 = f32[1,1,256,128] reverse(f32[1,1,256,128] %transpose.3530), dimensions={0,1}
%reverse.3554 = f32[1,1,256,512] reverse(f32[1,1,256,512] %transpose.3553), dimensions={0,1}
%reverse.3583 = f32[1,1,64,256] reverse(f32[1,1,64,256] %transpose.3582), dimensions={0,1}
%reverse.3611 = f32[3,3,64,64] reverse(f32[3,3,64,64] %transpose.3610), dimensions={0,1}
%reverse.3639 = f32[1,1,256,64] reverse(f32[1,1,256,64] %transpose.3638), dimensions={0,1}
%reverse.3669 = f32[1,1,64,256] reverse(f32[1,1,64,256] %transpose.3668), dimensions={0,1}
%reverse.3697 = f32[3,3,64,64] reverse(f32[3,3,64,64] %transpose.3696), dimensions={0,1}
%reverse.3725 = f32[1,1,256,64] reverse(f32[1,1,256,64] %transpose.3724), dimensions={0,1}
%reverse.3755 = f32[1,1,64,256] reverse(f32[1,1,64,256] %transpose.3754), dimensions={0,1}
%reverse.3783 = f32[3,3,64,64] reverse(f32[3,3,64,64] %transpose.3782), dimensions={0,1}
%reverse.3811 = f32[1,1,64,64] reverse(f32[1,1,64,64] %transpose.3810), dimensions={0,1}
%reverse.3834 = f32[1,1,64,256] reverse(f32[1,1,64,256] %transpose.3833), dimensions={0,1}
%select-and-scatter.3854 = f32[32,64,112,112] select-and-scatter(f32[32,64,112,112] %maximum.44, f32[32,64,56,56] %add.3844, f32[] %constant.3845), window={size=1x1x3x3 stride=1x1x2x2 pad=0_0x0_0x1_1x1_1}, select=%xla_ge_computation.3846, scatter=%AddComputation.3850

================================================================================
Please consider implementing the deep learning model or training loop using
supported XLA operators only.  neuronx-cc's supported XLA operator list may be
found by running `neuronx-cc list-operators --framework XLA`.

2022-11-04 08:46:56.230072: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] StackTrace:
2022-11-04 08:46:56.230110: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] *** Begin stack trace ***
2022-11-04 08:46:56.230116: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]        tensorflow::CurrentStackTrace()
2022-11-04 08:46:56.230126: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]        xla::util::ReportComputationError(tensorflow::Status const&, absl::lts_20211102::Span<xla::XlaComputation const* const>, absl::lts_20211102::Span<xla::Shape const* const>)
2022-11-04 08:46:56.230132: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]        xla::XrtComputationClient::ExecuteComputation(xla::ComputationClient::Computation const&, absl::lts_20211102::Span<std::shared_ptr<xla::ComputationClient::Data> const>, std::string const&, xla::ComputationClient::ExecuteComputationOptions const&)
2022-11-04 08:46:56.230142: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]
2022-11-04 08:46:56.230147: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]        xla::util::MultiWait::Complete(std::function<void ()> const&)
2022-11-04 08:46:56.230153: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]
2022-11-04 08:46:56.230160: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]
2022-11-04 08:46:56.230165: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]
2022-11-04 08:46:56.230175: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]        clone
2022-11-04 08:46:56.230183: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] *** End stack trace ***
2022-11-04 08:46:56.230192: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]
2022-11-04 08:46:56.230197: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] Status: UNIMPLEMENTED: From /job:localservice/replica:0/task:0:
2022-11-04 08:46:56.230206: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] 2 root error(s) found.
2022-11-04 08:46:56.230215: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]   (0) UNIMPLEMENTED:
2022-11-04 08:46:56.230224: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] The following HLO instructions are not supported by neuronx-cc:
2022-11-04 08:46:56.230244: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] ================================================================================
2022-11-04 08:46:56.230253: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]
2022-11-04 08:46:56.230262: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2399 = f32[1,1,512,2048] reverse(f32[1,1,512,2048] %transpose.2398), dimensions={0,1}
2022-11-04 08:46:56.230271: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2427 = f32[3,3,512,512] reverse(f32[3,3,512,512] %transpose.2426), dimensions={0,1}
2022-11-04 08:46:56.230279: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2455 = f32[1,1,2048,512] reverse(f32[1,1,2048,512] %transpose.2454), dimensions={0,1}
2022-11-04 08:46:56.230288: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2485 = f32[1,1,512,2048] reverse(f32[1,1,512,2048] %transpose.2484), dimensions={0,1}
2022-11-04 08:46:56.230297: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2513 = f32[3,3,512,512] reverse(f32[3,3,512,512] %transpose.2512), dimensions={0,1}
2022-11-04 08:46:56.230305: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2541 = f32[1,1,2048,512] reverse(f32[1,1,2048,512] %transpose.2540), dimensions={0,1}
2022-11-04 08:46:56.230313: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2571 = f32[1,1,512,2048] reverse(f32[1,1,512,2048] %transpose.2570), dimensions={0,1}
2022-11-04 08:46:56.230322: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2599 = f32[3,3,512,512] reverse(f32[3,3,512,512] %transpose.2598), dimensions={0,1}
2022-11-04 08:46:56.230330: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2627 = f32[1,1,1024,512] reverse(f32[1,1,1024,512] %transpose.2626), dimensions={0,1}
2022-11-04 08:46:56.230338: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2650 = f32[1,1,1024,2048] reverse(f32[1,1,1024,2048] %transpose.2649), dimensions={0,1}
2022-11-04 08:46:56.230341: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2679 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.2678), dimensions={0,1}
2022-11-04 08:46:56.230348: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2707 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.2706), dimensions={0,1}
2022-11-04 08:46:56.230356: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2735 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.2734), dimensions={0,1}
2022-11-04 08:46:56.230365: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2765 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.2764), dimensions={0,1}
2022-11-04 08:46:56.230374: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2793 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.2792), dimensions={0,1}
2022-11-04 08:46:56.230381: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2821 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.2820), dimensions={0,1}
2022-11-04 08:46:56.230388: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2851 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.2850), dimensions={0,1}
2022-11-04 08:46:56.230396: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2879 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.2878), dimensions={0,1}
2022-11-04 08:46:56.230405: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2907 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.2906), dimensions={0,1}
2022-11-04 08:46:56.230413: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2937 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.2936), dimensions={0,1}
2022-11-04 08:46:56.230420: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2965 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.2964), dimensions={0,1}
2022-11-04 08:46:56.230428: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2993 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.2992), dimensions={0,1}
2022-11-04 08:46:56.230437: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3023 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.3022), dimensions={0,1}
2022-11-04 08:46:56.230445: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3051 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.3050), dimensions={0,1}
2022-11-04 08:46:56.230453: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3079 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.3078), dimensions={0,1}
2022-11-04 08:46:56.230460: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3109 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.3108), dimensions={0,1}
2022-11-04 08:46:56.230469: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3137 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.3136), dimensions={0,1}
2022-11-04 08:46:56.230477: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3165 = f32[1,1,512,256] reverse(f32[1,1,512,256] %transpose.3164), dimensions={0,1}
2022-11-04 08:46:56.230485: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3188 = f32[1,1,512,1024] reverse(f32[1,1,512,1024] %transpose.3187), dimensions={0,1}
2022-11-04 08:46:56.230492: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3217 = f32[1,1,128,512] reverse(f32[1,1,128,512] %transpose.3216), dimensions={0,1}
2022-11-04 08:46:56.230501: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3245 = f32[3,3,128,128] reverse(f32[3,3,128,128] %transpose.3244), dimensions={0,1}
2022-11-04 08:46:56.230509: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3273 = f32[1,1,512,128] reverse(f32[1,1,512,128] %transpose.3272), dimensions={0,1}
2022-11-04 08:46:56.230516: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3303 = f32[1,1,128,512] reverse(f32[1,1,128,512] %transpose.3302), dimensions={0,1}
2022-11-04 08:46:56.230523: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3331 = f32[3,3,128,128] reverse(f32[3,3,128,128] %transpose.3330), dimensions={0,1}
2022-11-04 08:46:56.230532: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3359 = f32[1,1,512,128] reverse(f32[1,1,512,128] %transpose.3358), dimensions={0,1}
2022-11-04 08:46:56.230541: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3389 = f32[1,1,128,512] reverse(f32[1,1,128,512] %transpose.3388), dimensions={0,1}
2022-11-04 08:46:56.230548: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3417 = f32[3,3,128,128] reverse(f32[3,3,128,128] %transpose.3416), dimensions={0,1}
2022-11-04 08:46:56.230555: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3445 = f32[1,1,512,128] reverse(f32[1,1,512,128] %transpose.3444), dimensions={0,1}
2022-11-04 08:46:56.230564: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3475 = f32[1,1,128,512] reverse(f32[1,1,128,512] %transpose.3474), dimensions={0,1}
2022-11-04 08:46:56.230572: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3503 = f32[3,3,128,128] reverse(f32[3,3,128,128] %transpose.3502), dimensions={0,1}
2022-11-04 08:46:56.230580: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3531 = f32[1,1,256,128] reverse(f32[1,1,256,128] %transpose.3530), dimensions={0,1}
2022-11-04 08:46:56.230587: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3554 = f32[1,1,256,512] reverse(f32[1,1,256,512] %transpose.3553), dimensions={0,1}
2022-11-04 08:46:56.230596: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3583 = f32[1,1,64,256] reverse(f32[1,1,64,256] %transpose.3582), dimensions={0,1}
2022-11-04 08:46:56.230604: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3611 = f32[3,3,64,64] reverse(f32[3,3,64,64] %transpose.3610), dimensions={0,1}
2022-11-04 08:46:56.230612: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3639 = f32[1,1,256,64] reverse(f32[1,1,256,64] %transpose.3638), dimensions={0,1}
2022-11-04 08:46:56.230619: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3669 = f32[1,1,64,256] reverse(f32[1,1,64,256] %transpose.3668), dimensions={0,1}
2022-11-04 08:46:56.230628: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3697 = f32[3,3,64,64] reverse(f32[3,3,64,64] %transpose.3696), dimensions={0,1}
2022-11-04 08:46:56.230636: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3725 = f32[1,1,256,64] reverse(f32[1,1,256,64] %transpose.3724), dimensions={0,1}
2022-11-04 08:46:56.230643: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3755 = f32[1,1,64,256] reverse(f32[1,1,64,256] %transpose.3754), dimensions={0,1}
2022-11-04 08:46:56.230650: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3783 = f32[3,3,64,64] reverse(f32[3,3,64,64] %transpose.3782), dimensions={0,1}
2022-11-04 08:46:56.230660: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3811 = f32[1,1,64,64] reverse(f32[1,1,64,64] %transpose.3810), dimensions={0,1}
2022-11-04 08:46:56.230668: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3834 = f32[1,1,64,256] reverse(f32[1,1,64,256] %transpose.3833), dimensions={0,1}
2022-11-04 08:46:56.230676: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %select-and-scatter.3854 = f32[32,64,112,112] select-and-scatter(f32[32,64,112,112] %maximum.44, f32[32,64,56,56] %add.3844, f32[] %constant.3845), window={size=1x1x3x3 stride=1x1x2x2 pad=0_0x0_0x1_1x1_1}, select=%xla_ge_computation.3846, scatter=%AddComputation.3850
2022-11-04 08:46:56.230683: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]
2022-11-04 08:46:56.230692: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] ================================================================================
2022-11-04 08:46:56.230700: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] Please consider implementing the deep learning model or training loop using
2022-11-04 08:46:56.230708: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] supported XLA operators only.  neuronx-cc's supported XLA operator list may be
2022-11-04 08:46:56.230714: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] found by running `neuronx-cc list-operators --framework XLA`.
2022-11-04 08:46:56.230723: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]
2022-11-04 08:46:56.230731: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]         [[{{node XRTExecute}}]]
2022-11-04 08:46:56.230739: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]         [[XRTExecute_G12]]
2022-11-04 08:46:56.230747: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]   (1) UNIMPLEMENTED:
2022-11-04 08:46:56.230756: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] The following HLO instructions are not supported by neuronx-cc:
2022-11-04 08:46:56.230764: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] ================================================================================
2022-11-04 08:46:56.230771: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]
2022-11-04 08:46:56.230778: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2399 = f32[1,1,512,2048] reverse(f32[1,1,512,2048] %transpose.2398), dimensions={0,1}
2022-11-04 08:46:56.230787: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2427 = f32[3,3,512,512] reverse(f32[3,3,512,512] %transpose.2426), dimensions={0,1}
2022-11-04 08:46:56.230795: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2455 = f32[1,1,2048,512] reverse(f32[1,1,2048,512] %transpose.2454), dimensions={0,1}
2022-11-04 08:46:56.230802: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2485 = f32[1,1,512,2048] reverse(f32[1,1,512,2048] %transpose.2484), dimensions={0,1}
2022-11-04 08:46:56.230809: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2513 = f32[3,3,512,512] reverse(f32[3,3,512,512] %transpose.2512), dimensions={0,1}
2022-11-04 08:46:56.230818: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2541 = f32[1,1,2048,512] reverse(f32[1,1,2048,512] %transpose.2540), dimensions={0,1}
2022-11-04 08:46:56.230826: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2571 = f32[1,1,512,2048] reverse(f32[1,1,512,2048] %transpose.2570), dimensions={0,1}
2022-11-04 08:46:56.230834: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2599 = f32[3,3,512,512] reverse(f32[3,3,512,512] %transpose.2598), dimensions={0,1}
2022-11-04 08:46:56.230841: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2627 = f32[1,1,1024,512] reverse(f32[1,1,1024,512] %transpose.2626), dimensions={0,1}
2022-11-04 08:46:56.230850: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2650 = f32[1,1,1024,2048] reverse(f32[1,1,1024,2048] %transpose.2649), dimensions={0,1}
2022-11-04 08:46:56.230858: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2679 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.2678), dimensions={0,1}
2022-11-04 08:46:56.230866: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2707 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.2706), dimensions={0,1}
2022-11-04 08:46:56.230873: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2735 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.2734), dimensions={0,1}
2022-11-04 08:46:56.230881: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2765 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.2764), dimensions={0,1}
2022-11-04 08:46:56.230889: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2793 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.2792), dimensions={0,1}
2022-11-04 08:46:56.230897: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2821 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.2820), dimensions={0,1}
2022-11-04 08:46:56.230904: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2851 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.2850), dimensions={0,1}
2022-11-04 08:46:56.230912: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2879 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.2878), dimensions={0,1}
2022-11-04 08:46:56.230920: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2907 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.2906), dimensions={0,1}
2022-11-04 08:46:56.230928: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2937 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.2936), dimensions={0,1}
2022-11-04 08:46:56.230934: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2965 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.2964), dimensions={0,1}
2022-11-04 08:46:56.230944: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2993 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.2992), dimensions={0,1}
2022-11-04 08:46:56.230952: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3023 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.3022), dimensions={0,1}
2022-11-04 08:46:56.230959: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3051 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.3050), dimensions={0,1}
2022-11-04 08:46:56.230966: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.3079 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.3078), dimensions={0,1}
2022-11-04 08:46:56.230976: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.
2022-11-04 08:46:56.230984: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] Recent warning and error logs:
2022-11-04 08:46:56.230991: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]   OP_REQUIRES failed at tpu_execute_op.cc:266 : UNIMPLEMENTED:
2022-11-04 08:46:56.230998: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] The following HLO instructions are not supported by neuronx-cc:
2022-11-04 08:46:56.231007: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] ================================================================================
2022-11-04 08:46:56.231015: E tensorflow/compiler/xla/xla_client/xla_util.cc:88]
2022-11-04 08:46:56.231022: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2399 = f32[1,1,512,2048] reverse(f32[1,1,512,2048] %transpose.2398), dimensions={0,1}
2022-11-04 08:46:56.231029: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2427 = f32[3,3,512,512] reverse(f32[3,3,512,512] %transpose.2426), dimensions={0,1}
2022-11-04 08:46:56.231038: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2455 = f32[1,1,2048,512] reverse(f32[1,1,2048,512] %transpose.2454), dimensions={0,1}
2022-11-04 08:46:56.231046: E tensorflow/compiler/xla/xla_client/xla_util.cc:88] %reverse.2485 = f32[1
Traceback (most recent call last):
  File "train_rn50.py", line 79, in <module>
    main()
  File "train_rn50.py", line 57, in main
    xm.mark_step() # XLA: collect ops and run them in XLA runtime
  File "/home/ubuntu/anaconda3/envs/rn50/lib/python3.8/site-packages/torch_xla/core/xla_model.py", line 847, in mark_step
    torch_xla._XLAC._xla_step_marker(
RuntimeError: UNIMPLEMENTED: From /job:localservice/replica:0/task:0:
2 root error(s) found.
  (0) UNIMPLEMENTED:
The following HLO instructions are not supported by neuronx-cc:
================================================================================

%reverse.2399 = f32[1,1,512,2048] reverse(f32[1,1,512,2048] %transpose.2398), dimensions={0,1}
%reverse.2427 = f32[3,3,512,512] reverse(f32[3,3,512,512] %transpose.2426), dimensions={0,1}
%reverse.2455 = f32[1,1,2048,512] reverse(f32[1,1,2048,512] %transpose.2454), dimensions={0,1}
%reverse.2485 = f32[1,1,512,2048] reverse(f32[1,1,512,2048] %transpose.2484), dimensions={0,1}
%reverse.2513 = f32[3,3,512,512] reverse(f32[3,3,512,512] %transpose.2512), dimensions={0,1}
%reverse.2541 = f32[1,1,2048,512] reverse(f32[1,1,2048,512] %transpose.2540), dimensions={0,1}
%reverse.2571 = f32[1,1,512,2048] reverse(f32[1,1,512,2048] %transpose.2570), dimensions={0,1}
%reverse.2599 = f32[3,3,512,512] reverse(f32[3,3,512,512] %transpose.2598), dimensions={0,1}
%reverse.2627 = f32[1,1,1024,512] reverse(f32[1,1,1024,512] %transpose.2626), dimensions={0,1}
%reverse.2650 = f32[1,1,1024,2048] reverse(f32[1,1,1024,2048] %transpose.2649), dimensions={0,1}
%reverse.2679 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.2678), dimensions={0,1}
%reverse.2707 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.2706), dimensions={0,1}
%reverse.2735 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.2734), dimensions={0,1}
%reverse.2765 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.2764), dimensions={0,1}
%reverse.2793 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.2792), dimensions={0,1}
%reverse.2821 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.2820), dimensions={0,1}
%reverse.2851 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.2850), dimensions={0,1}
%reverse.2879 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.2878), dimensions={0,1}
%reverse.2907 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.2906), dimensions={0,1}
%reverse.2937 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.2936), dimensions={0,1}
%reverse.2965 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.2964), dimensions={0,1}
%reverse.2993 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.2992), dimensions={0,1}
%reverse.3023 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.3022), dimensions={0,1}
%reverse.3051 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.3050), dimensions={0,1}
%reverse.3079 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.3078), dimensions={0,1}
%reverse.3109 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.3108), dimensions={0,1}
%reverse.3137 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.3136), dimensions={0,1}
%reverse.3165 = f32[1,1,512,256] reverse(f32[1,1,512,256] %transpose.3164), dimensions={0,1}
%reverse.3188 = f32[1,1,512,1024] reverse(f32[1,1,512,1024] %transpose.3187), dimensions={0,1}
%reverse.3217 = f32[1,1,128,512] reverse(f32[1,1,128,512] %transpose.3216), dimensions={0,1}
%reverse.3245 = f32[3,3,128,128] reverse(f32[3,3,128,128] %transpose.3244), dimensions={0,1}
%reverse.3273 = f32[1,1,512,128] reverse(f32[1,1,512,128] %transpose.3272), dimensions={0,1}
%reverse.3303 = f32[1,1,128,512] reverse(f32[1,1,128,512] %transpose.3302), dimensions={0,1}
%reverse.3331 = f32[3,3,128,128] reverse(f32[3,3,128,128] %transpose.3330), dimensions={0,1}
%reverse.3359 = f32[1,1,512,128] reverse(f32[1,1,512,128] %transpose.3358), dimensions={0,1}
%reverse.3389 = f32[1,1,128,512] reverse(f32[1,1,128,512] %transpose.3388), dimensions={0,1}
%reverse.3417 = f32[3,3,128,128] reverse(f32[3,3,128,128] %transpose.3416), dimensions={0,1}
%reverse.3445 = f32[1,1,512,128] reverse(f32[1,1,512,128] %transpose.3444), dimensions={0,1}
%reverse.3475 = f32[1,1,128,512] reverse(f32[1,1,128,512] %transpose.3474), dimensions={0,1}
%reverse.3503 = f32[3,3,128,128] reverse(f32[3,3,128,128] %transpose.3502), dimensions={0,1}
%reverse.3531 = f32[1,1,256,128] reverse(f32[1,1,256,128] %transpose.3530), dimensions={0,1}
%reverse.3554 = f32[1,1,256,512] reverse(f32[1,1,256,512] %transpose.3553), dimensions={0,1}
%reverse.3583 = f32[1,1,64,256] reverse(f32[1,1,64,256] %transpose.3582), dimensions={0,1}
%reverse.3611 = f32[3,3,64,64] reverse(f32[3,3,64,64] %transpose.3610), dimensions={0,1}
%reverse.3639 = f32[1,1,256,64] reverse(f32[1,1,256,64] %transpose.3638), dimensions={0,1}
%reverse.3669 = f32[1,1,64,256] reverse(f32[1,1,64,256] %transpose.3668), dimensions={0,1}
%reverse.3697 = f32[3,3,64,64] reverse(f32[3,3,64,64] %transpose.3696), dimensions={0,1}
%reverse.3725 = f32[1,1,256,64] reverse(f32[1,1,256,64] %transpose.3724), dimensions={0,1}
%reverse.3755 = f32[1,1,64,256] reverse(f32[1,1,64,256] %transpose.3754), dimensions={0,1}
%reverse.3783 = f32[3,3,64,64] reverse(f32[3,3,64,64] %transpose.3782), dimensions={0,1}
%reverse.3811 = f32[1,1,64,64] reverse(f32[1,1,64,64] %transpose.3810), dimensions={0,1}
%reverse.3834 = f32[1,1,64,256] reverse(f32[1,1,64,256] %transpose.3833), dimensions={0,1}
%select-and-scatter.3854 = f32[32,64,112,112] select-and-scatter(f32[32,64,112,112] %maximum.44, f32[32,64,56,56] %add.3844, f32[] %constant.3845), window={size=1x1x3x3 stride=1x1x2x2 pad=0_0x0_0x1_1x1_1}, select=%xla_ge_computation.3846, scatter=%AddComputation.3850

================================================================================
Please consider implementing the deep learning model or training loop using
supported XLA operators only.  neuronx-cc's supported XLA operator list may be
found by running `neuronx-cc list-operators --framework XLA`.

         [[{{node XRTExecute}}]]
         [[XRTExecute_G12]]
  (1) UNIMPLEMENTED:
The following HLO instructions are not supported by neuronx-cc:
================================================================================

%reverse.2399 = f32[1,1,512,2048] reverse(f32[1,1,512,2048] %transpose.2398), dimensions={0,1}
%reverse.2427 = f32[3,3,512,512] reverse(f32[3,3,512,512] %transpose.2426), dimensions={0,1}
%reverse.2455 = f32[1,1,2048,512] reverse(f32[1,1,2048,512] %transpose.2454), dimensions={0,1}
%reverse.2485 = f32[1,1,512,2048] reverse(f32[1,1,512,2048] %transpose.2484), dimensions={0,1}
%reverse.2513 = f32[3,3,512,512] reverse(f32[3,3,512,512] %transpose.2512), dimensions={0,1}
%reverse.2541 = f32[1,1,2048,512] reverse(f32[1,1,2048,512] %transpose.2540), dimensions={0,1}
%reverse.2571 = f32[1,1,512,2048] reverse(f32[1,1,512,2048] %transpose.2570), dimensions={0,1}
%reverse.2599 = f32[3,3,512,512] reverse(f32[3,3,512,512] %transpose.2598), dimensions={0,1}
%reverse.2627 = f32[1,1,1024,512] reverse(f32[1,1,1024,512] %transpose.2626), dimensions={0,1}
%reverse.2650 = f32[1,1,1024,2048] reverse(f32[1,1,1024,2048] %transpose.2649), dimensions={0,1}
%reverse.2679 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.2678), dimensions={0,1}
%reverse.2707 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.2706), dimensions={0,1}
%reverse.2735 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.2734), dimensions={0,1}
%reverse.2765 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.2764), dimensions={0,1}
%reverse.2793 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.2792), dimensions={0,1}
%reverse.2821 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.2820), dimensions={0,1}
%reverse.2851 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.2850), dimensions={0,1}
%reverse.2879 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.2878), dimensions={0,1}
%reverse.2907 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.2906), dimensions={0,1}
%reverse.2937 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.2936), dimensions={0,1}
%reverse.2965 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.2964), dimensions={0,1}
%reverse.2993 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.2992), dimensions={0,1}
%reverse.3023 = f32[1,1,256,1024] reverse(f32[1,1,256,1024] %transpose.3022), dimensions={0,1}
%reverse.3051 = f32[3,3,256,256] reverse(f32[3,3,256,256] %transpose.3050), dimensions={0,1}
%reverse.3079 = f32[1,1,1024,256] reverse(f32[1,1,1024,256] %transpose.3078), dimensions={0,1}
%reverse.
Recent warning and error logs:
  OP_REQUIRES failed at tpu_execute_op.cc:266 : UNIMPLEMENTED:
The following HLO instructions are not supported by neuronx-cc:
================================================================================

%reverse.2399 = f32[1,1,512,2048] reverse(f32[1,1,512,2048] %transpose.2398), dimensions={0,1}
%reverse.2427 = f32[3,3,512,512] reverse(f32[3,3,512,512] %transpose.2426), dimensions={0,1}
%reverse.2455 = f32[1,1,2048,512] reverse(f32[1,1,2048,512] %transpose.2454), dimensions={0,1}
%reverse.2485 = f32[1
hannanjgaws commented 1 year ago

Hello @BugFreeee:

CNN based models are not supported on Trainium as of the latest Neuron release. You can track the Neuron roadmap item for CNN support for updates. You can also take a look at our Model Architecture Fit Guidelines to learn more about what model architectures are currently supported on Trainium.

I will close this ticket in favor of the existing CNN support roadmap item: https://github.com/aws-neuron/aws-neuron-sdk/issues/528.