Gradient Caching trainer error.

raghavlite commented 2 months ago

I run into this error when running gradient caching. Here is my command.

    -m training.run \
    --output_dir /usr/project/xtmp/rt195/Sentence_Embedding/F5/gritlm/data/m7_temp \
    --model_name_or_path mistralai/Mistral-7B-v0.1 \
    --train_data /usr/project/xtmp/rt195/Sentence_Embedding/F5/gritlm/data/MEDI2/allnli.jsonl \
    --learning_rate 2e-5 \
    --lr_scheduler_type linear \
    --warmup_ratio 0.03 \
    --max_steps 1253 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 5 \
    --dataloader_drop_last \
    --normalized \
    --temperature 0.02 \
    --train_group_size 2 \
    --negatives_cross_device \
    --query_max_len 256 \
    --passage_max_len 2048 \
    --mode embedding \
    --logging_steps 1 \
    --bf16 \
    --pooling_method mean \
    --attn cccc \
    --attn_implementation sdpa \
    --save_steps 5000 \
    --gradient_checkpointing \

[node-01:1]:    loss_emb = gc(inputs["query"], inputs["passage"], no_sync_except_last=no_sync_except_last)
[node-01:1]:  File "/home/users/rt195/anaconda3/envs/gritlm/lib/python3.9/site-packages/grad_cache/grad_cache.py", line 70, in __call__
[node-01:1]:    return self.cache_step(*args, **kwargs)
[node-01:1]:  File "/home/users/rt195/anaconda3/envs/gritlm/lib/python3.9/site-packages/grad_cache/grad_cache.py", line 262, in cache_step
[node-01:1]:    assert all(map(lambda m: isinstance(m, nn.parallel.DistributedDataParallel), self.models)), \
[node-01:1]:AssertionError: Some of models are not wrapped in DistributedDataParallel. Make sure you are running DDP with proper initializations.

Ay idea why this might be happening?

Muennighoff commented 2 months ago

You need to use this GradCache: https://github.com/ContextualAI/gritlm/tree/main/gritlm/training/GradCache ; also added it to the readme

raghavlite commented 2 months ago

thank you. It worked

ContextualAI / gritlm

Gradient Caching trainer error. #28