-
I have come across many similar issues asking about how to add new tokens to a vocabulary, for reference, here are a couple links to useful comments made for doing roughly that:
- https://github.co…
-
I am new to using the huggingface transsformers and running in to issues with squad_convert_examples_to_features. Would appreciate any insights -
I have four calls to squad_convert_examples_to_feat…
-
https://arxiv.org/pdf/2010.06467.pdf
다큐먼트 랭킹의 경우 symmetric하지 않음. 따라서 semantic similarity가 qeur-document 연관성을 그대로 측정하기는
어려울 수 있음.
MS MARCO: 데이터 셋
Text ranking의 formulation은 구조화되지 않은 corpus/t…
-
I did a ablation study whether pretraining could benefit downstream tasks by finetuning without loading state dict from checkpoint. All other settings were kept same. the finetuning process stopped at…
-
Is there a way to extend sparknlp and create my custom embedder similar to `BertEmbeddings`? There are some interesting models on TF Hub which I would like to try.
-
感谢OpenRLHF开发团队,很棒的工作!
在使用该框架的时候遇到了一点问题,求答疑:
我在 2节点(8*H800)上调用 train_sft 训练了一个30B 规模的语言模型,然后在单节点/2节点 上调用 batch_inference 进行推理,看日志貌似卡在 model.generate() 处,一直等到 12 h(nccl timeout=720 min)报了 timeout 错误…
-
Deep Learning Simplified Repository (Proposing new issue)
🔴 Project Title : Email Spam Detection
🔴 Aim : Using ML and Python to create a project which can detect spam/junk emails.
🔴 Dataset :…
-
**ISSUE TRANSFER: Optimum repository -> https://github.com/huggingface/optimum/issues/555**
This issue is about the working group specially created for this task. If you are interested in helping o…
-
tested versions: optimum-neuron versions 0.0.8 and 0.0.9
I'm training a bert-base-uncased as a binary text classifier with spam/not spam dataset (Deysi/spam-detection-dataset).
The training works …
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
0526版本
### Reproduction
[2024-07-02 06:42:39,946] [INFO] [comm.py:637:init_distributed] cdb=None
[2024…