-
Following the update that has the code download pre-embedded corpuses (great change that! ); I get an error when trying to run the README example
```
medrag = MedRAG(llm_name=LL_NAME, rag=True,
…
-
When I calculate the sentence similarity score using sbert and then train my own language model. I use different sentence combination methods.
Given two list of sentence, list_1 = [s1, s2, s3], lis…
-
Hi, sweety.
We are currently attempting to reproduce your research, but we are facing some difficulties. We would like to try the experiment using the data you presented in your poster. Could you …
-
### Metadata
- Authors: Tao Ge, Furu Wei, Ming Zhou
- Organization: MSRA
- Conference: ACL 2018
- Original Paper: http://aclweb.org/anthology/P18-1097 (present a detailed comparison and analysis f…
-
In Marian, invalid alignments leads to a crash, as the index bounds for tokens is not checked. This breaks training. Plus, if alignments are generated incorrectly on the OpusTrainer side, this will de…
-
Hi @nreimers ,
I have a model trained on (german) stsb. Now I want to "mine" more stsb like data from a domainspecific corpus to improve the model. I plan to use my already available model to do th…
-
# Benchmarking embedding alignment with GPTs 🤖 and CLIPs 📎
Contacts: Mike (S), Marc
Participants:
## Goals and deliverable
1. Developing [AstroPT](https://github.com/Smith42/astroPT) to the p…
-
It seems like (from the paper) that the current setup for pair classification relies upon computing similarity metrics for the pairs embeddings and then decide on a _binary_ threshold. Naturally for t…
-
We might be able to find open source toxicity lists.
-
Hi everyone,
This is a small question related to how models are fine-tuned during the first step of training. I see that the default loss function is `losses.CosineSimilarityLoss`. But when generat…