StrongResearch / isc-demos

Deep learning examples for the Instant Super Computer
11 stars 0 forks source link

Timm - efficientnetv2_xl #43

Open StrongTanisha opened 10 months ago

StrongTanisha commented 10 months ago

Source / repo

https://github.com/huggingface/pytorch-image-models

Model description

[DESCRIPTION]

Dataset

[DATASET]

Literature benchmark source

[URL]

Literature benchmark performance

[DESCRIPTION] [VALUE/S]

Strong Compute result achieved

[VALUE/S]

Basic training config (as applicable)

Nodes: 12 Epochs: 350 Effective batch size: [N] Learning rate: [L] Optimizer: [OPT]

Logs gist

[URL]

StrongTanisha commented 10 months ago

Can't replicate the batch size of the experiment - not enough GPU memory. Could verify on cloud

Calvin to sync with Adam re gradient accumulation

StrongBec commented 10 months ago

Retried with gradient accumulation

experiment_name="timm-efficientnetv2_xl" gpu_type="24GB VRAM GPU" nnodes = 12 venv_path = "/mnt/Client/Strongzpnpupxvdfdllpjvckewupy3re/becstrlaxex7elmnesblpq7jurqemkbu/.venv/bin/activate" output_path = "/mnt/Client/Strongzpnpupxvdfdllpjvckewupy3re/becstrlaxex7elmnesblpq7jurqemkbu/output_timm" command = "train_cycling.py /mnt/.node1/Open-Datasets/imagenet/ILSVRC/Data/CLS-LOC --model=efficientnetv2_xl --weight-decay=1e-5 --decay-rate=0.03 --decay-epochs=2.4 --grad-accum-steps=2 --bn-momentum=0.99 --epochs=350 --lr=0.256 --batch-size=28 --amp --resume $OUTPUT_PATH/checkpoint.pt"