-
Our current approach embeds datasets using Sentence Transformers that give us one embedding per "chunk" of text (so if we pass in 500 tokens of text or 100 tokens of text we always get 1 embedding). S…
-
The `datasette-embeddings` extension currently requires the use of hosted OpenAI models and the availability of an OpenAI API key to generate embeddings:
```python
async def calculate_embeddin…
-
I was aggregating models for a different project and realized there are a couple of older multilingual baselines we should try. `facebook/mcontriever-msmarco` (Contriever multilingual) and `castorini…
-
[ ] I checked the [documentation](https://docs.ragas.io/) and related resources and couldn't find an answer to my question.
** Facing error with using Langchain wrapped hugging face models**
I am …
-
As the title, has anyone got similar issue?
Traceback (most recent call last):
File "/data//instructor-embedding/clustering.py", line 3, in
model = Instructor('hkunlp/instructor-large')
F…
-
mteb takes 30s while transformers takes 6s (https://colab.research.google.com/drive/166j9HRQF8hAqwiiYpMDvYoznm1v-veOC?usp=sharing). Probably because of the many `*` imports 🤔
-
### System Info
tranformers v2.17.2
node v18.20.3
### Environment/Platform
- [ ] Website/web-app
- [ ] Browser extension
- [X] Server-side (e.g., Node.js, Deno, Bun)
- [ ] Desktop app (e.g., Elect…
-
The CDE model is incredibly powerful, as it naturally integrates "context tokens" into the embedding process. As of October 1st, 2024, the cde-small-v1 stands as the top-performing small model (under …
-
### System Information
Linux x86-64
Python 3.10.5
`sentence_transformers` 3.0.1
`transformers` 4.41.2
`datasets` 2.19.2
### Reproduction
Running on GPU:
```py
from datasets import load_data…
-
### Issue type
Bug
### Have you reproduced the bug with TensorFlow Nightly?
Yes
### Source
source
### TensorFlow version
tensorflow==2.15.0.post1
### Custom code
Yes
### OS platform and dist…