-
Our current `SymbolDocEmbeddingHandler` class is an effective method for embedding source code documentation into symbols. However, there are instances in which relevant and important documentation do…
-
Hi,
I have been running this script on WWC vocabulary (minimum 100 frequency), and it takes very long. So, I was wondering if there was a difference in what was run here and my build, and whether …
-
Here're some notes from Friday's meeting with James alongside my thoughts.
James recommended two things:
1. Scrape Amazon for the list of "products similar to this" to create a distance metric b…
-
Hi !
Would you please generate any readme.md to explain how to use your code? Or maybe this repo is related to any paper or tutorial? Would you please tell me? I have a lot interest in it. Thank you …
-
### Description
Looking through all of the available vw documentation, there doesn't seem to be any clear documentation on all of the valid input value types for features (at least that I can find)…
-
Hi,
I have a question regarding the calculation of self-attention.
In the paper, you state that prior to pre-training, _following BERT, we randomly sample two segments (either from the same co…
-
Pre-training:
1. Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering(ACL2022)
2. RetroMAE v2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Langua…
-
I don't seem to find the relevant function in [https://www.sbert.net/docs/training/overview.html#](url).Do I need to modify the code of the encode function in the SentenceTransformer.py file by myself…
-
Llama2 系の Embedding モデルに対応することで、よりリッチな埋め込み空間が作れるかもしれない。
日本語の継続事前学習モデルを利用して、Last Hidden Layer 等から埋め込みを取得する方法が考えられる。
他のモデルと比較検討できるようにすること。
-
Hi,
First wanted to thank you for sharing this great tool, amazingly reliable so far and already embedded ~ a billion sentences.
There's a few things that could really be helpful (in order of im…