Qengineering / Jetson-Nano-Ubuntu-20-image

Jetson Nano with Ubuntu 20.04 image
https://qengineering.eu/install-ubuntu-20.04-on-jetson-nano.html
BSD 3-Clause "New" or "Revised" License
646 stars 70 forks source link

Jetson nano sometimes extremely slow with GPU #51

Closed noobyzy closed 9 months ago

noobyzy commented 9 months ago

Hello there!

First of all, I would like to thank you for your contribution to the new features on Jetson Nano.

Recently I am running on a very small net in pytorch:

class Net(nn.Module):
    """Simple CNN adapted from 'PyTorch: A 60 Minute Blitz'."""

    def __init__(self) -> None:
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.bn1 = nn.BatchNorm2d(6)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.bn2 = nn.BatchNorm2d(16)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.bn3 = nn.BatchNorm1d(120)
        self.fc2 = nn.Linear(120, 84)
        self.bn4 = nn.BatchNorm1d(84)
        self.fc3 = nn.Linear(84, 10)
        self.relu = nn.ReLU()

    # pylint: disable=arguments-differ,invalid-name
    def forward(self, x: Tensor) -> Tensor:
        """Compute forward pass."""

        x = self.pool(self.relu(self.bn1(self.conv1(x))))

        x = self.pool(self.relu(self.bn2(self.conv2(x))))

        x = x.view(-1, 16 * 5 * 5)
        x = self.relu(self.bn3(self.fc1(x)))

        x = self.relu(self.bn4(self.fc2(x)))

        x = self.fc3(x)

        return x

The dataset is CIFAR10, batch size 64. (In fact I am doing a Federated learning task, so dataset is splitted, and each nano device has ~15-20 batches per epoch).

There are two issues.

  1. The start of training is slow: takes around 60 - 90 s for the nano device to do the initial pass like following

    # Perform a single forward pass to properly initialize BatchNorm
    _ = model(next(iter(trainloader))[0].to(DEVICE))

    while on a desktop (using GTX 1660 / 2060) the initial pass finishes within seconds.

  2. Occasionally extremely slow training process: It is acceptable that Nano is slower than desktops (for 10 epochs, 15-20 batches/epoch, takes around 15-18s, while on desktops ~2s). But sometimes the Nano device is extremely slow , takes ~ 500 - 900 s to finish 10 epochs. This situation may be resolved by simply reboot, but it come back all of a sudden.

I have tried to turn the power mode by

sudo nvpmodel -m 0
sudo jetson_clocks

but I don't think this works.

Please help me with this issue. Many thanks.

Qengineering commented 9 months ago

@noobyzy,

To put it bluntly, you cannot train a deep learning model on a Jetson Nano. It lacks the necessary computational power. The nano is targeted deploying networks.