Closed merrye closed 1 year ago
Hi,
Thank you for your appreciation. Will the different splits be loaded during the training?
If you are using the "ArgoverseInDisk" data loader, it will take much longer than the "ArgoverseInMem" data loader. Actually, I never finished the training with the "ArgoverseInDisk" data loader. T T
In my case, my desktop is installed with a 10700K intel CPU and two Nvidia RTX 2080 GPUs. Each training epoch takes about 20mins. Also, all the data is stored in an m.2 SDD.
I'm afraid the training speed is optimized to the fastest I can achieve.
Increasing the Swap size and loading all the data at once using "ArgoverseInMem" data loader will accelerate your training, I assure you.
I have tried to load all the data at once using the "ArgoverseInMem" data loader, but it failed (I think it was lack of memory). Now I have to split the dataset. And I'm using the "ArgoverseInMem" data loader for training, but it still takes about 2 hours and I'm confused.
Could you please explain the way you split the dataset? And will you load the different splits during the training?
Thanks for your support. I have solved it.
Hi, thanks for sharing your great work. I split the train dataset of the Argoverse dataset into ten small parts, and it takes about 2 hours to train one epoch, and it is estimated to take about 83 days to train the complete dataset. How long is your training time? Can I know your hardware resources by the way? The following attachments are my commands and hardware resources.![1666350188215](https://user-images.githubusercontent.com/22674078/197181496-44847d4e-1e44-475d-8103-ef339a22ac71.jpg)
python -m torch.distributed.launch --nproc_per_node=2 train_net.py -d dataset/interm_data -o run/net/ -a -b 8 -c -m --lr 0.0012 -luf 10 -ldr 0.3 -e 100 -w 40