OpenDriveLab / Openpilot-Deepdive

Our insights of Openpilot, a deepdive project on it
MIT License
231 stars 66 forks source link

How much CPU RAM and GPU memory is used in this project? #5

Closed MicroHest closed 2 years ago

MicroHest commented 2 years ago

I run this project. But it seems that I‘m either out of RAM or GPU memory.

My computer has 8 NVIDIA GeForce RTX 3090 devices, each device has 24G memory roughly. It has 2 Intel(R) Xeon(R) Gold 6226R CPUs, each CPU has 26 cores. The ram size is 376GiB in total.

At first, I run this project with the default setting according to the paper.

# boom with CUDA out of memory.
parser.add_argument('--batch_size', type=int, default=48)
parser.add_argument('--lr', type=float, default=1e-4)
parser.add_argument('--n_workers', type=int, default=8)
parser.add_argument('--epochs', type=int, default=100)
parser.add_argument('--log_per_n_step', type=int, default=20)
parser.add_argument('--val_per_n_epoch', type=int, default=1)

parser.add_argument('--resume', type=str, default='')

parser.add_argument('--M', type=int, default=5)
parser.add_argument('--num_pts', type=int, default=33)
parser.add_argument('--mtp_alpha', type=float, default=1.0)
parser.add_argument('--optimizer', type=str, default='adamw')

Then I tried a smaller batch size of 8 but failed with CUDA out of memory. Finally, I modified optimize_per_n_step to 20. Well, this time it works. But after a short time, I found my work process 2 exited unexcepted. After observing the top command's panel, I found my computer had run out of RAM before the process exited.

Finally, my computer works well in configuration below.

# optimize_per_n_step is 20.
parser.add_argument('--batch_size', type=int, default=8) # changed
parser.add_argument('--lr', type=float, default=1e-4)
parser.add_argument('--n_workers', type=int, default=2) # changed
parser.add_argument('--epochs', type=int, default=100)
parser.add_argument('--log_per_n_step', type=int, default=20)
parser.add_argument('--val_per_n_epoch', type=int, default=1)

parser.add_argument('--resume', type=str, default='')

parser.add_argument('--M', type=int, default=5)
parser.add_argument('--num_pts', type=int, default=33)
parser.add_argument('--mtp_alpha', type=float, default=1.0)
parser.add_argument('--optimizer', type=str, default='adamw') 

Could you tell me how much CPU RAM and GPU memory is recommended for this project? Or your computer info.

ElectronicElephant commented 2 years ago

Hi @MicroHest ,

We implemented and tested our code on the lab's slurm cluster, which features 2 socket E5 processor, 8 V100 GPUs (with 32G gRAM) and 512GB RAM, so it may be problematic when running on bare metal machines. Indeed, it requires much RAM, for we have to extract all the frames from the video, before sending them into the network. We're looking into this issue and hope it can run on some widely-used graphic cards like 1080Ti. Please kindly wait for a few days.

MicroHest commented 2 years ago

OK,thanks

ElectronicElephant commented 2 years ago

Hi @MicroHest ,

Sorry for the delay. I have updated the code for bare-metal machines. Please pull the latest code and follow the instructions https://github.com/OpenPerceptionX/Openpilot-Deepdive#training-and-testing .

Also, please kindly note that the batch_size is actually the batch size per GPU. When setting it to 6, it consumes around 30G GPU memory, so it's interesting to know that you can run with batch_size=8 on 24GB 3090s without OOM.

As for RAM, when setting batch_size=6 and n_workers=4, it consumes around 40 to 50 GB RAM per process. (That means, if you run with 8 cards, you will have to multiply it by 8).