Open mylovecc2020 opened 3 years ago
I define a loss_function class, and in this class, I Imitate CosineSimilarityLoss class to get the embedding of sentences. When I get embedding features and hand-crafted features I combine them, compute the loss in the next. Is this right? Do you have a better method? Thanks!
I think you'd need to look into at least 3 pretty important questions for a concat-and-compare approach:
Issue (2), and maybe (1), can be addressed through preprocessing of your hand-crafted features. Issue (3) can be addressed by adding a layer for dimensionality reduction of embeddings (weights tied). A problem is that these fixes seem like they'd require a lot of hyperparameter tuning.
An alternative to concat-and-compare is compare-and-concat: define a similarity metric for hand-crafted features, concatenate it with the cosine similarity from embeddings, and then learn a model from (similarity_features, similarity_embeddings) -> similarity_label.
When you have a limited number of binary features, you can add special tokens to the tokenizer and prepend them to the text input.
So your sentence becomes e.g. "[FEAT1] [FEAT3] My first example" and another example becomes "[FEAT2] [FEAT3] [FEAT4] Another input text"
Only works when you have binary features and when an example does not have too many features.
Hi @nreimers
Can you expand on this method a bit more? Just a bit surprised that incorporating binary features relevant to sentence similarity is really as simple as prepending a special token for each "on" binary feature. Two questions I have:
Hi @kddubey 1) I don't see any connection to training with BatchHardTripletLoss. BatchHardTripletLoss is a loss function, the other is how to extend you text with additional input features. 2) I think it would be good to either ensure that the special tokens are always in the same order, or to shuffle the order during training.
I think you'd need to look into at least 3 pretty important questions for a concat-and-compare approach:
- are angles between features useful signal for your data and task? Maybe, e.g., euclidean distances are more useful.
- what's the scale of the features relative to each other and to the embeddings? If they're on the scale of 10-100, then they'll dominate the cosine similarity value.
- how many hand-crafted features are there? If there are much fewer than 768 and on a low scale, they'd be drowned out by embeddings.
Issue (2), and maybe (1), can be addressed through preprocessing of your hand-crafted features. Issue (3) can be addressed by adding a layer for dimensionality reduction of embeddings (weights tied). A problem is that these fixes seem like they'd require a lot of hyperparameter tuning.
An alternative to concat-and-compare is compare-and-concat: define a similarity metric for hand-crafted features, concatenate it with the cosine similarity from embeddings, and then learn a model from (similarity_features, similarity_embeddings) -> similarity_label.
Thanks!
When you have a limited number of binary features, you can add special tokens to the tokenizer and prepend them to the text input.
So your sentence becomes e.g. "[FEAT1] [FEAT3] My first example" and another example becomes "[FEAT2] [FEAT3] [FEAT4] Another input text"
Only works when you have binary features and when an example does not have too many features.
I feel crazy about this approach, but it's actually much easier to implement because you can extract the features as you feed them into the string. But why binary Features? Is there any theoretical basis? I'll try that right away. Thanks a lot!
Hi @mylovecc2020 Binary Features: Because your text has either the special token (like [FEAT3]) or not. So if you want to classify text in an online community and you want to differentiate between guest and users, you have change your text like "[GUEST] This is a post" or "[USER] This is a post".
For continuous features, this sadly doesn't work as there are infinite number of values. But there you can use binning, e.g. when you have a feature like "how long is the user registered in days", you can create bins like "[0-10days]", "[10-100days]", "[100+days]" and then have these 3 special tokens that you add to your input text
Hi, thanks for your great works! And now hand-crafted features also work well for categorizing tasks in my field, so I wanted to combine manual features with deep features for categorizing tasks. However, the loss function 、the model, and the fit function of the training process are all encapsulated. Is there an easy way to add manual features directly to depth features? If there is no corresponding method in the existing framework, I will directly use Bi-Encoder to encode the sentence, then contact with the handcrafted feature, and finally calculate cosine similarity. Is this ok? Or use Bi-Encoder to encode the sentence, then contact with the handcrafted feature,and add full connected layer used for categorizing tasks。 Thanks!