-
Has anyone tried doing hard negative mining when generating the sentence pairs as opposed to random sampling? @tomaarsen - is random sampling the default?
-
Consider the following string:
```
We made $4500 and then $4600 the next month
```
Solara renders this as:
We can manually escape `$` ahead of time, but it is a complex procedure. Only pair…
-
When looking at the neural translation log, it appears to me that, very often, for the live re-training, MMT is using large (or very large) sentence, where only a very small part is interesting, givin…
-
How to continue the pretraining of Sentence-BERT models using MLM?
Is there any documentation or code snippet for this purpose?
I would like to continue the pretraining of "all-MiniLM-L6-v2" mode…
-
In the Amazon paper "Supervised Clustering Loss for Clustering-Friendly Sentence Embeddings: an Application to Intent Clustering", they calculate PRAUC of the model rather differently: "using the true…
-
Currently doing "Japanese Core 2000 Step 01 Listening Sentence Vocab + Images" but the browser did not show up the images or image-text pairs when the audio plays.
-
Hi,
I am currently working on the finetuning of "distiluse-base-multilingual-cased-v1", using MultipleNegativesRankingLoss and RerankingEvaluator, over a dataset of 700k (query, sentence) pairs. I'…
-
I've been using the fully-automated level of the tool. I have about 200 sentences pairs (Spanish/English) I want to align, but it's taking forever because I reload the language models every time to ru…
-
This might be a naive question but I am unable to understand how to use metric MDS. When following the examples in the documentation, I only get classical MDS out. In the documentation, an example for…
-
Implement a function to compute each score for large corpus. It reads sentence pairs line by line, compute score for each pair, then aggregate the result.