How to pretrain on a single machine (no using SLURM)

ViTAE-Transformer / MTP

The official repo for [JSTARS'24] "MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining"

MIT License

176 stars 11 forks source link

How to pretrain on a single machine (no using SLURM) #14

Open geonoon opened 5 months ago

geonoon commented 5 months ago

Thank you for this amazing project.

I tried to perform pretraining on a single machine, with a Nvidia A100 GPU, or just with a CPU, but it could not work through.

It seems the script file main_pretrain.py needs to be modified somehow.

Could you offer help in detail on this matter?

Thanks in advance.

DotWang commented 5 months ago

@geonoon In fact, we have considered two cases for distributed pretraining: SLURM and server, but I'm not sure whether the main_pretrain.py of MTP can be implemented on the server, maybe you can refer to this, to revise the codes related to the distributed pretraining.

Here is a command example:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 \
    --nnodes=1 --master_port=10001 --master_addr = [server ip] main_pretrain.py