-
Hello,
I am using the PAWS-X train dataset for the German language. Upon analysing `translated_train.tsv` for German, I found 3,209 cases which consisted of identical sentence pairs. 84 of these 3,…
-
Hi, I'm new to NLP, and I am currently trying to finetune jina for text similarity comparison.
I construct a dataset with columns `sentence1`, `sentence2` and `score`. And I can easily train the mod…
-
# Semantic Textual Similarity
## Task Objective
Evaluate the semantic understanding level of the models by comparing with the human-labeled sentence similarity. The task is part of the metatask http…
-
Hello, I would like download the sentences used to train and test the system in a plain-text format. I tried to look into the "data" folder but there are only numbers without text.
best
-
My dataset has 20k samples, 200 labels, and 32 iterations, so that means around 128 million samples, right?
there's some way to parallelize the pairs sentences creation?
or at least to save these pa…
-
### Feature : Create Dataset Pipelines
from raw "documents" / nodes / text (and other modalities?)
create NER / QnA pairs / Etc synthetically
### Tasks
- [ ] create NER end-to-end pipeli…
-
https://github.com/mozilla-l10n/mt-training-data
Maybe we could add it to OPUS.
-
-
I’m trying to get AlephBERT encoding for sentence pairs.
I manage to get the tokens for the batch of sentences. But when I feed it to AlephBERT I get :
IndexError: index out of range in self
For …
-
I've been trying to reproduce your work, especially the rectified flow part. However, the reflow procedure always results in poorer synthesis quality (even for small sampling steps). I'm wondering if …