[node-01:1]: loss_emb = gc(inputs["query"], inputs["passage"], no_sync_except_last=no_sync_except_last)
[node-01:1]: File "/home/users/rt195/anaconda3/envs/gritlm/lib/python3.9/site-packages/grad_cache/grad_cache.py", line 70, in __call__
[node-01:1]: return self.cache_step(*args, **kwargs)
[node-01:1]: File "/home/users/rt195/anaconda3/envs/gritlm/lib/python3.9/site-packages/grad_cache/grad_cache.py", line 262, in cache_step
[node-01:1]: assert all(map(lambda m: isinstance(m, nn.parallel.DistributedDataParallel), self.models)), \
[node-01:1]:AssertionError: Some of models are not wrapped in DistributedDataParallel. Make sure you are running DDP with proper initializations.
I run into this error when running gradient caching. Here is my command.
Ay idea why this might be happening?