Open olivierr42 opened 4 months ago
Hi @olivierr42,
I am facing a similar issue, however, I think this is a transformers issue, and not sentence-transformers. Did you find any fix for this?
Hi @olivierr42,
I am facing a similar issue, however, I think this is a transformers issue, and not sentence-transformers. Did you find any fix for this?
Just fixing gradient_checkpointing_kwargs
to {"use_reentrant":False}
worked for me.
ddp_find_unused_parameters=True
That is it if you use gradient_checkpointing=True.
Steve
I am trying to train on a 8xA100 instance. If I set
trainer_arguments.gradient_checkpointing
toTrue
, the training hangs for a while and then dies with aSegmentation fault (core dumped)
error. The error does not occur on a single GPU node, and it does not happen if gradient checkpointing is not enabled. As a precision: settinggradient_checkpointing_kwargs
to{"use_reentrant":False}
works, but I think. the default settings (which are to use the reentrant variant of checkpointing) should work.I am using the MultipleNegativeRankingLoss with an appropriate dataset.
I am almost certain that this is not a SentenceTransformers issue, but since gradient checkpointing is used by the biggest sentence embedding solutions, I am seeking some help here.
Thank you!