Open fjhheras opened 4 years ago
I found some previous answers that were relevant, and even if they do not give all the details, I managed to get something working. I added a last layer to the transformer with several Dense instances. Then, depending on the value of self.condition, one instance is chosen:
def __init__(self, in_features, out_features, bias=True,
activation_function=nn.Tanh(), conditions=None):
........
self.conditions = conditions
dense_dict = {key: Dense(in_features, out_features, bias=bias,
activation_function=activation_function)
for key in self.conditions}
self.dense_dict = nn.ModuleDict(dense_dict)
def forward(self, features):
return self.dense_dict[self.condition].forward(features)
So when I encode from 1, I put first list(module.children())[-1].condition='1'
, etc. It is not beautiful (monkey patching), but it works. If I write a PR to make a layer like this, would you be interested?
I had to make other changes in CosineSimilariryLoss.py
and EmbeddingSimilarityEvaluator.py
, (to change condition
before each call to encode).
Hi @fjhheras Yes, if a nice and clean integration of that would be quite cool.
I think the best way is to integration the information on the condition into the dataloader. This information is passed to all intermediate modules and could be read from there.
Best Nils
Thank you for your answer, @nreimers
How would you send information to all modules?
For example, SentenceTransformer.encode
calls self.forward(features)
. This forward
is inherited from nn.Sequential, so it sends all the arguments to the first module (in the case I am testing modules/BERT
), which does self.bert(**features)
, where self.bert is a huggingface transformer.
If I add the key text_type
to the dictionary features
it fails with an error because the huggingface transformer does not have that keyword argument. Even if the last module has that key, it will not work.
I can bypass the first module by creating a method forward in SentenceTransformer
:
def forward(self, features, intermediate_features=None):
for i, module in enumerate(self):
if i == 1 and intermediate_features is not None:
features.update(intermediate_features)
features = module(features)
return features
not sure how general or desirable this would be (and still not sure how to do the equivalent for training)...
Hi @fjhheras My idea was more to inject into the features array a new key, like: features['condition'] = 1
Than in the dense layer, you can check for features['condition'] and either pass it through an identity layer or through a non-linear dense layer.
But I'm not sure yet how to get the datareader so that it can add features to input text, which are preserved withing the sequential pipeline.
Yes, I understood your suggestion. But the first module does not seem to accept an extra key in the features dictionary. At least in the way it is called in SentenceTransformer.encode
I am stuck in the same situation where I want to:
Training two independent transformer models (one for 1, the other for 2) Input 1 >> transformer 1 >> Pooling >> Output 1 Input 2 >> transformer 2 >> Pooling >> Output 2
Any help will be appreciated. Thank You!
Just for completeness: See #328, there I comment an easy method to create two independent embeddings for different inputs without needing any code changes.
I would like to finetune BERT (or similar) models for an asymmetric task using two different embeddings. There will be two inputs (1 and 2), and I would use an embedding in 1 and an embedding in 2 to build meaningful distances between 1 and 2. But I cannot use a common embedding, because sentences in 1 are of very different nature from sentences in 2 (it is not exactly like that, but you can think of questions and answers)
I have thought about several options:
Input 1 >> transformer 1 >> Pooling >> Output 1 Input 2 >> transformer 2 >> Pooling >> Output 2
Input 1 >> transformer >> Pooling >> Output 1 Input 2 >> transformer >> Pooling >> extra layer >> Output 2
or
Input 1 >> transformer >> Pooling >> Output 1 Input 2 >> transformer >> extra layer >> Pooling >> Output 2
Do you think there is an easy way to do this by adapting one of the training scripts? I would appreciate some guidance about what codes I can try to adapt in my use case, so I can make the most of the code that is already in this repo!