I am running STEP3 to train segmentation models. I am not using distributed training. I always get this error when I run multiple training processes together. I switched to different servers but this still happens.
@lkpengcs Thank you for your attention to our work. I guess the server you used doesn't have enough CPU memory, so it was automatically killed. You can set a lower cache_rate to train STEP 3.
I am running STEP3 to train segmentation models. I am not using distributed training. I always get this error when I run multiple training processes together. I switched to different servers but this still happens.