Open StrongTanisha opened 10 months ago
Can't replicate the batch size of the experiment - not enough GPU memory. Could verify on cloud
Calvin to sync with Adam re gradient accumulation
Retried with gradient accumulation
experiment_name="timm-efficientnetv2_xl" gpu_type="24GB VRAM GPU" nnodes = 12 venv_path = "/mnt/Client/Strongzpnpupxvdfdllpjvckewupy3re/becstrlaxex7elmnesblpq7jurqemkbu/.venv/bin/activate" output_path = "/mnt/Client/Strongzpnpupxvdfdllpjvckewupy3re/becstrlaxex7elmnesblpq7jurqemkbu/output_timm" command = "train_cycling.py /mnt/.node1/Open-Datasets/imagenet/ILSVRC/Data/CLS-LOC --model=efficientnetv2_xl --weight-decay=1e-5 --decay-rate=0.03 --decay-epochs=2.4 --grad-accum-steps=2 --bn-momentum=0.99 --epochs=350 --lr=0.256 --batch-size=28 --amp --resume $OUTPUT_PATH/checkpoint.pt"
Source / repo
https://github.com/huggingface/pytorch-image-models
Model description
[DESCRIPTION]
Dataset
[DATASET]
Literature benchmark source
[URL]
Literature benchmark performance
[DESCRIPTION] [VALUE/S]
Strong Compute result achieved
[VALUE/S]
Basic training config (as applicable)
Nodes: 12 Epochs: 350 Effective batch size: [N] Learning rate: [L] Optimizer: [OPT]
Logs gist
[URL]