Disk read and write too fast, resulting in server lag, how to reduce read and write speed

❓ Questions and Help

I use two gpu to run code on vqav2 dataset using movie_mcan model, the gpu memory is not enough so the batch_size is set to 16, but every time I run the code will cause the server abnormal lag, I use sar -d 3 5 to check the disk read and write, I found that the read speed is very fast, how to improve this problem, when the lag I can't do any operation. This is my training code CUDA_VISIBLE_DEVICES=2,3 mmf_run config=projects/movie_mcan/configsqa2/defaults.yaml model=movie_mcan dataset=vqa2 run_type=train env.cache_dir=/data/students/zzj/ env.data_dir=/data/students/zzj/ training.batch_size=16

Here are the read and write speeds 8Q2OJIC30({P~ Y2(XJ08QD 51AFL@1@S9U~RZH48G5P7AD

facebookresearch / mmf

Disk read and write too fast, resulting in server lag, how to reduce read and write speed #1267

❓ Questions and Help