Open legaltextai opened 2 months ago
CC @jonsaadfalcon
Try the V1 models: https://huggingface.co/hazyresearch/M2-BERT-32K-Retrieval-Encoder-V1
Those have seen some legal data during training so hopefully they should work a bit better :)
If they still don't work, would love to hear more about the setup and see if there's a mismatch with how we trained them!
both query and the documents use the same embedding protocol, correct? i don't need to add any extra when embedding the prompt, like in UAE Large, right? OK to use cosine similarity and hnsw index? there are some small models, that i am also testing, with smaller context. i doubt they have been trained on any legal data and you can test them and see how they perform.
Yes, both queries and documents use the same protocol and model, there's no extra prompt.
Embedded using the API Significantly underperforms vs other models In most of the cases, each embedding is a full text of the Supreme Court decision Indexed with hnsw. Should I use a different index? I store in Postgres and use pgvector for similarity search. togethercomputer/m2-bert-80M-32k-retrieval Thanks