Closed CauchyFanUpdate closed 4 months ago
I've noticed that both pretraining and finetuning get stuck at this point: 'trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.layers.0.downsample.reduction.bias <- decoder.cls_head.bias.' Can you explain the reason for this? I'm not sure why this is happening. I'm running it on 8 RTX 3090s.
Hi, I am not sure what is happening here. Did you get any error messages? Maybe it is due to out-of-memory. You can try to reduce the batch size and see if it works.
I'm facing a persistent blocking issue with torch.distributed.barrier(group=group). What could be the cause of this?"