Open danaekdml opened 1 year ago
์ ๋ ๊ฐ์ ์ค๋ฅ๊ฐ ๋ฐ์ํ์ฌ hugging face๋ฅผ ์ด์ฉํ๋ ๋ฐฉ๋ฒ์ผ๋ก ํด๊ฒฐํ์ต๋๋ค. hugging face ์ ์๋ ์ฝ๋๋ฅผ ํ์ฉํ์๊ณ , BERTSentenceTransform์์ ํ๋ฒ ๋ ์ค๋ฅ๊ฐ ๋ฐ์ํ๋๋ฐ ์ด๋ถ๋ถ์ pyํ์ผ์์ class๋ฅผ ๋ณต๋ถํ์ฌ ์ฝ๋๋ฅผ ์์ ํ๋ ๋ฐฉ๋ฒ์ผ๋ก ํด๊ฒฐํ์ต๋๋ค.
blog ์ฝ๋๊ฐ ๊ธธ์ด์ ๋ธ๋ก๊ทธ์ ์์ฑํ๋๋ฐ ์ฐธ๊ณ ํ์ ๋ ์ข์ ๊ฒ ๊ฐ์ต๋๋ค.
hugging face ์ blog ๋ฅผ ์ฐธ๊ณ ํด์ ๋ค์ ๋๋ ค๋ณด๋๋ฐ ์ ํ๋ ์์น๊ฐ 0.17๋ก ๋๋ฌด ๋ฎ๊ฒ ๋์ค๋ค์. ๋ญ๊ฐ ๋ฌธ์ ์ผ๊น์? hugging face๋ก ํ๊ธฐ ์ ์๋ ์ ํ๋๊ฐ 0.56์ ๋ ์๋๋ฐ ์ด์ํ๋ค์
!pip install mxnet !pip install gluonnlp==0.8.0 !pip install tqdm pandas !pip install sentencepiece !pip install transformers !pip install torch !pip install 'git+https://github.com/SKTBrain/KoBERT.git#egg=kobert_tokenizer&subdirectory=kobert_hf'
from kobert_tokenizer import KoBERTTokenizer from transformers import BertModel
import torch from torch import nn import torch.nn.functional as F import torch.optim as optim from torch.utils.data import Dataset, DataLoader import gluonnlp as nlp import numpy as np from tqdm.notebook import tqdm from transformers import AdamW from transformers.optimization import get_cosine_schedule_with_warmup
tokenizer = KoBERTTokenizer.from_pretrained('skt/kobert-base-v1') bertmodel = BertModel.from_pretrained('skt/kobert-base-v1', return_dict=False) vocab = nlp.vocab.BERTVocab.from_sentencepiece(tokenizer.vocab_file, padding_token='[PAD]')
tok = tokenizer.tokenize ์ด ์ค์ ์๋ชป๋ ๋ถ๋ถ์ด ์์๊น์?
I have the same issue!
I used the python3.7 It solves some version problems. but still the gluonnlp is not installed properly :(
In addition, I could not use tranformers library to download the KoBERT
.
BERTSentenceTransform ํด๋์ค ์ ์ธํ ๋, 19๋ฒ์งธ ์ค ๋ถ๋ถ์
tokens_a = self._tokenizer(text_a) ๋์ tokens_a = self._tokenizer.tokenize(text_a) ๋ก ์์ ํด์ผ ๋ชจ๋ธ์ด ์ ๋๋ก ๋์๊ฐ๋๊ฒ ๊ฐ์ต๋๋ค... [์ถ์ฒ] No module named 'kobert' ์๋ฌ ํด๊ฒฐ|์์ฑ์ yeon ์ด ๋๊ธ๋๋ก ํ๋ ์ ํ๋๊ฐ ๋ค์ ๋์์ก์ด์
text_a = 'ํ๊ตญ์ด ๋ชจ๋ธ์ ๊ณต์ ํฉ๋๋ค.'
tokens_1 = tokenizer.tokenize(text_a)
tokens_2 = tokenizer(text_a)
print(tokens_1, type(tokens_1))
print(tokens_2, type(tokens_2))
output :
tokens_1 : ['โํ๊ตญ', '์ด', 'โ๋ชจ๋ธ', '์', 'โ๊ณต์ ', 'ํฉ๋๋ค', '.'] <class 'list'>
tokens_2 : {'input_ids': [2, 4958, 6855, 2046, 7088, 1050, 7843, 54, 3], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1]} <class 'transformers.tokenization_utils_base.BatchEncoding'>
BERTSentenceTransform ํด๋์ค์์,
tokens_a = self._tokenizer(text_a)
๊ธฐ์กด ์ฝ๋๋๋ก ์คํํ๋ฉด, tokens_a ๋ ์์ tokens_2 ํ์์ด ๋ฉ๋๋ค. ๊ฒฐ๊ตญ token๊ณผ input_ids๋
token = [[CLS], 'input_ids', 'token_type_ids', 'attention_mask', [SEP]]
input_ids = [2, 0, 0, 0, 3, 1, 1, 1, 1, ...]
input_ids ๊ฐ dataset ๊ธธ์ด๋งํผ ์ ๋ถ ๊ฐ์ ํ์์ผ๋ก ๋ฐ๋์ด ์ ํ๋๊ฐ ๋งค์ฐ ๋ฎ์์ง๋ ๊ฒ๋๋ค.
๊ทธ๋ฆฌ๊ณ ๋ง์ฝ
tok = tokenizer.tokenize
data_train = BERTDataset(dataset_train, 0, 1, tok , vocab, max_len, True, False)
data_test = BERTDataset(dataset_test, 0, 1, tok , vocab, max_len, True, False)
tok = tokenizer.tokenize ๋ก ๋ณํํด BERTSentenceTransform ํด๋์ค์ ๋ฃ์ผ๋ฉด, ์๋ง๋ convert_tokens_to_ids๋ฅผ ๋ถ๋ฌ์ฌ ์ ์๋ค๋ ์ค๋ฅ๊ฐ ๋์ฌ๊ฒ๋๋ค. tokenizer.tokenize๋ก ๋ณํ์์ผ์ ๋ฃ์ง๋ง๊ณ , BERTSentenceTransform ๋ด๋ถ์์
#tokens_a = self._tokenizer(text_a)
tokens_a = self._tokenizer.tokenize(text_a) #์์
์ง์ ์ฝ๋๋ฅผ ์์ ํ๋ฉด ์ค๋ฅ๊ฐ ์์๊ฑฐ์์
์ค๋ช ๊ฐ์ฌํฉ๋๋ค!
I have the same issue! I used the python3.7 It solves some version problems. but still the gluonnlp is not installed properly :( In addition, I could not use tranformers library to download the
KoBERT
.
I installed it with transformers! Use python10 and the gluonnlp !=0.10.0
@AbdirayimovS Could you please share some library installation code?
data_train = BERTDataset(dataset_train, 0, 1, tokenizer,vocab, max_len, True, False) data_test = BERTDataset(dataset_test, 0, 1, tokenizer, vocab, max_len, True, False) ์ฝ๋๋ฅผ ์ด๋ ๊ฒ ๋ฐ๊ฟ๋ณด์ธ์
-----Original Message----- From: "Yeongseo @.> To: @.>; Cc: @.>; @.>; Sent: 2023-07-31 (์) 18:26:25 (GMT+09:00) Subject: Re: [SKTBrain/KoBERT] [BUG] (Issue #104)
data_train = BERTDataset(dataset_train, 0, 1, tok , vocab, max_len, True, False) data_test = BERTDataset(dataset_test, 0, 1, tok , vocab, max_len, True, False) ์ด ์ฝ๋๋ฅผ ์คํํ๋ฉด ---> 80 tokens_a = self._tokenizer.tokenize(text_a) 81 tokens_b = None 82 AttributeError: 'function' object has no attribute 'tokenize' ์ด๋ฐ ์๋ฌ๊ฐ ๋น๋๋ค.. ์๊ทธ๋ฌ๋ ๊ฑฐ์ฃ ? โ Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
I wonder if there are any contradictions in this installation dependencies
when I ues pip install git+https://git@github.com/SKTBrain/KoBERT.git@master it shows ERROR: Cannot install kobert because these package versions have conflicting dependencies.
The conflict is caused by: onnxruntime 1.8.0 depends on numpy>=1.16.6 gluonnlp 0.6.0 depends on numpy mxnet 1.4.0.post0 depends on numpy<1.15.0 and >=1.8.2 onnxruntime 1.8.0 depends on numpy>=1.16.6 gluonnlp 0.6.0 depends on numpy mxnet 1.4.0 depends on numpy<1.15.0 and >=1.8.2
To fix this you could try to:
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
plesae tell how to deal with it thank you!
๐ Bug
No module named 'kobert' ## To Reproducefrom kobert.utils import get_tokenizer from kobert.pytorch_kobert import get_pytorch_kobert_model
์ด๊ฑฐ๋ฅผ ๋๋ฆฌ๋ ค ํ ๋ ModuleNotFoundError Traceback (most recent call last) in <cell line: 2>()
1 #kobert
----> 2 from kobert.utils import get_tokenizer
3 from kobert.pytorch_kobert import get_pytorch_kobert_model
ModuleNotFoundError: No module named 'kobert' NOTE: If your import is failing due to a missing package, you can manually install dependencies using either !pip or !apt.
To view examples of installing some common dependencies, click the "Open Examples" button below.
์ด๋ฌํ ์๋ฌ ๋ฐ์ gluonnlp ๋ฒ์ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๋ kobert ๋ฌธ์ ๋ฐ์
๋ฒ๊ทธ๋ฅผ ์ฌํํ๊ธฐ ์ํ ์ฌํ์ ์ฐจ๋ฅผ ์์ฑํด์ฃผ์ธ์.
Expected behavior
Environment
Additional context