Open visionscaper opened 2 years ago
I took the liberty to already offer my fix in this pull request. Please let me know what you think.
Hello,
First of all, thank you for making this library available.
The implementation follows the standard Pytorch DistributedDataParallel receipt.
Do you mind to share your code - how did you wrap sentence-transformers model into DDPand trained it?
Hello,
First of all, thank you for making this library available.
I created code to fine-tune sentence transformer models by performing data-parallel training, distributed over multiple GPUs. The implementation follows the standard Pytorch DistributedDataParallel receipt.
Issue
In this context I encounter the following when trying to finetune the sentence transformer model based on the pre-trained
stsb-roberta-base-v2
:Cause of issue
Huggingface implementations of models like Roberta add a pooling layer by default which is not used by the sentence transformer and thus do not participate in calculating the loss.
Fixing the issue
Although a simple way to suppress this error would be to set
find_unused_parameters=True
forDistributedDataParallel
, this can still cause issues. For instance, in my case a run-time error occurs where I'm not allowed todetach views in place
.A better, more fundamental solution would be to allow to set
add_pooling_layer=False
. To make this fix more generic, it would be good to be able to inject a dictionary with any required custom Huggingface parameters (custom_hf_params
).I have implemented this solution in this fork.
It would be great if this (or similar) fix could be merged in to this repo!