-
Whu it changes pytorch version and installs different cuda on the system?
This would break most peoples's environments actually, because there can be only one cuda version on the Ubuntu, and it has…
Oxi84 updated
1 month ago
-
```
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained('m3e-base/')
model = AutoModel.from_pretrained('m3e-base/')
model.eval()
def get_sentence_embeddi…
-
This test fails:
```
List sentences = new AmericanEnglish().getSentenceTokenizer().tokenize("First sentence.\u2028Second sentence.");
Assert.assertEquals(Arrays.asList("First sentence.", "S…
-
Thinking about including a tokenizer class in the project.
I'm thinking the API could look like:
```python
from iranlowo.tokenizer import Tokenizer
text = "some text"
word_tokens = Tokenizer(…
-
### Feature request
When use tokenizer, it truncate data to max_length, but can't just discard the data.
### Motivation
Sometimes we want the sentence to be complete
### Your contribution
No
-
Nikolay:
Chinese alphabet should be added. In general we can use a unicode ranges to do so, but they are somewhat complicated: https://stackoverflow.com/questions/43418812/check-whether-a-string-cont…
-
I am testing "train_stsbenchmark.py" with huggingface transformer "Rostlab/prot_t5_xl_uniref50" and get the following error. What am I missing? How do I fix it? Thanks.
$ python train_stsbenchmark.…
-
**LocalAI version:2.16.0
**Environment, CPU architecture, OS, and Version:**
mac studio M2 Ultra
**Describe the bug**
using backend transformers for glm4, trust_remote_code: true not c…
-
Dear `lambeq` developers,
I was playing around with the package and testing the parsing example in the bobcat tutorial by simply running
```python
parser = BobcatParser()
diagram = parser.senten…
-
```python
# DistilBERT 토크나이저 로드
tokenizer = DistilBertTokenizer.from_pretrained('monologg/distilkobert')
# 데이터를 DistilBERT 입력 형식으로 변환하는 함수 정의
def convert_to_input(df, tokenizer, max_length=400):…