SKTBrain / KoBERT

Korean BERT pre-trained cased (KoBERT)
Apache License 2.0
1.31k stars 368 forks source link

[BUG] #104

Open danaekdml opened 1 year ago

danaekdml commented 1 year ago

๐Ÿ› Bug

No module named 'kobert' ## To Reproduce

from kobert.utils import get_tokenizer from kobert.pytorch_kobert import get_pytorch_kobert_model

์ด๊ฑฐ๋ฅผ ๋Œ๋ฆฌ๋ ค ํ•  ๋•Œ ModuleNotFoundError Traceback (most recent call last) in <cell line: 2>() 1 #kobert ----> 2 from kobert.utils import get_tokenizer 3 from kobert.pytorch_kobert import get_pytorch_kobert_model

ModuleNotFoundError: No module named 'kobert' NOTE: If your import is failing due to a missing package, you can manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the "Open Examples" button below.

์ด๋Ÿฌํ•œ ์—๋Ÿฌ ๋ฐœ์ƒ gluonnlp ๋ฒ„์ „ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋‹ˆ kobert ๋ฌธ์ œ ๋ฐœ์ƒ

๋ฒ„๊ทธ๋ฅผ ์žฌํ˜„ํ•˜๊ธฐ ์œ„ํ•œ ์žฌํ˜„์ ˆ์ฐจ๋ฅผ ์ž‘์„ฑํ•ด์ฃผ์„ธ์š”.

  1. -
  2. -
  3. -

Expected behavior

Environment

Additional context

hwangsaeyeon commented 1 year ago

์ €๋„ ๊ฐ™์€ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜์—ฌ hugging face๋ฅผ ์ด์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ํ•ด๊ฒฐํ–ˆ์Šต๋‹ˆ๋‹ค. hugging face ์— ์žˆ๋Š” ์ฝ”๋“œ๋ฅผ ํ™œ์šฉํ•˜์˜€๊ณ , BERTSentenceTransform์—์„œ ํ•œ๋ฒˆ ๋” ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๋Š”๋ฐ ์ด๋ถ€๋ถ„์€ pyํŒŒ์ผ์—์„œ class๋ฅผ ๋ณต๋ถ™ํ•˜์—ฌ ์ฝ”๋“œ๋ฅผ ์ˆ˜์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ํ•ด๊ฒฐํ–ˆ์Šต๋‹ˆ๋‹ค.

blog ์ฝ”๋“œ๊ฐ€ ๊ธธ์–ด์„œ ๋ธ”๋กœ๊ทธ์— ์ž‘์„ฑํ–ˆ๋Š”๋ฐ ์ฐธ๊ณ ํ•˜์…”๋„ ์ข‹์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

kibeomi commented 1 year ago

hugging face ์™€ blog ๋ฅผ ์ฐธ๊ณ ํ•ด์„œ ๋‹ค์‹œ ๋Œ๋ ค๋ณด๋Š”๋ฐ ์ •ํ™•๋„ ์ˆ˜์น˜๊ฐ€ 0.17๋กœ ๋„ˆ๋ฌด ๋‚ฎ๊ฒŒ ๋‚˜์˜ค๋„ค์š”. ๋ญ๊ฐ€ ๋ฌธ์ œ์ผ๊นŒ์š”? hugging face๋กœ ํ•˜๊ธฐ ์ „์—๋Š” ์ •ํ™•๋„๊ฐ€ 0.56์ •๋„ ์˜€๋Š”๋ฐ ์ด์ƒํ•˜๋„ค์š”

!pip install mxnet !pip install gluonnlp==0.8.0 !pip install tqdm pandas !pip install sentencepiece !pip install transformers !pip install torch !pip install 'git+https://github.com/SKTBrain/KoBERT.git#egg=kobert_tokenizer&subdirectory=kobert_hf'

from kobert_tokenizer import KoBERTTokenizer from transformers import BertModel

import torch from torch import nn import torch.nn.functional as F import torch.optim as optim from torch.utils.data import Dataset, DataLoader import gluonnlp as nlp import numpy as np from tqdm.notebook import tqdm from transformers import AdamW from transformers.optimization import get_cosine_schedule_with_warmup

tokenizer = KoBERTTokenizer.from_pretrained('skt/kobert-base-v1') bertmodel = BertModel.from_pretrained('skt/kobert-base-v1', return_dict=False) vocab = nlp.vocab.BERTVocab.from_sentencepiece(tokenizer.vocab_file, padding_token='[PAD]')

tok = tokenizer.tokenize ์ด ์ค‘์— ์ž˜๋ชป๋œ ๋ถ€๋ถ„์ด ์žˆ์„๊นŒ์š”?

AbdirayimovS commented 1 year ago

I have the same issue! I used the python3.7 It solves some version problems. but still the gluonnlp is not installed properly :( In addition, I could not use tranformers library to download the KoBERT.

kibeomi commented 1 year ago

BERTSentenceTransform ํด๋ž˜์Šค ์„ ์–ธํ• ๋•Œ, 19๋ฒˆ์งธ ์ค„ ๋ถ€๋ถ„์—

tokens_a = self._tokenizer(text_a) ๋Œ€์‹  tokens_a = self._tokenizer.tokenize(text_a) ๋กœ ์ˆ˜์ •ํ•ด์•ผ ๋ชจ๋ธ์ด ์ œ๋Œ€๋กœ ๋Œ์•„๊ฐ€๋Š”๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค... [์ถœ์ฒ˜] No module named 'kobert' ์—๋Ÿฌ ํ•ด๊ฒฐ|์ž‘์„ฑ์ž yeon ์ด ๋Œ“๊ธ€๋Œ€๋กœ ํ•˜๋‹ˆ ์ •ํ™•๋„๊ฐ€ ๋‹ค์‹œ ๋†’์•„์กŒ์–ด์š”

siyeol97 commented 1 year ago
text_a = 'ํ•œ๊ตญ์–ด ๋ชจ๋ธ์„ ๊ณต์œ ํ•ฉ๋‹ˆ๋‹ค.'
tokens_1 = tokenizer.tokenize(text_a)
tokens_2 = tokenizer(text_a)
print(tokens_1, type(tokens_1))
print(tokens_2, type(tokens_2))
output : 
tokens_1 : ['โ–ํ•œ๊ตญ', '์–ด', 'โ–๋ชจ๋ธ', '์„', 'โ–๊ณต์œ ', 'ํ•ฉ๋‹ˆ๋‹ค', '.'] <class 'list'>
tokens_2 : {'input_ids': [2, 4958, 6855, 2046, 7088, 1050, 7843, 54, 3], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1]} <class 'transformers.tokenization_utils_base.BatchEncoding'>

BERTSentenceTransform ํด๋ž˜์Šค์—์„œ, tokens_a = self._tokenizer(text_a) ๊ธฐ์กด ์ฝ”๋“œ๋Œ€๋กœ ์‹คํ–‰ํ•˜๋ฉด, tokens_a ๋Š” ์œ„์˜ tokens_2 ํ˜•์‹์ด ๋ฉ๋‹ˆ๋‹ค. ๊ฒฐ๊ตญ token๊ณผ input_ids๋Š”

token = [[CLS], 'input_ids', 'token_type_ids', 'attention_mask', [SEP]]
input_ids = [2, 0, 0, 0, 3, 1, 1, 1, 1, ...]

input_ids ๊ฐ€ dataset ๊ธธ์ด๋งŒํผ ์ „๋ถ€ ๊ฐ™์€ ํ˜•์‹์œผ๋กœ ๋ฐ”๋€Œ์–ด ์ •ํ™•๋„๊ฐ€ ๋งค์šฐ ๋‚ฎ์•„์ง€๋Š” ๊ฒ๋‹ˆ๋‹ค.

๊ทธ๋ฆฌ๊ณ  ๋งŒ์•ฝ

tok = tokenizer.tokenize
data_train = BERTDataset(dataset_train, 0, 1, tok , vocab, max_len, True, False)
data_test = BERTDataset(dataset_test, 0, 1, tok , vocab, max_len, True, False)

tok = tokenizer.tokenize ๋กœ ๋ณ€ํ™˜ํ•ด BERTSentenceTransform ํด๋ž˜์Šค์— ๋„ฃ์œผ๋ฉด, ์•„๋งˆ๋„ convert_tokens_to_ids๋ฅผ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์—†๋‹ค๋Š” ์˜ค๋ฅ˜๊ฐ€ ๋‚˜์˜ฌ๊ฒ๋‹ˆ๋‹ค. tokenizer.tokenize๋กœ ๋ณ€ํ™˜์‹œ์ผœ์„œ ๋„ฃ์ง€๋ง๊ณ , BERTSentenceTransform ๋‚ด๋ถ€์—์„œ

#tokens_a = self._tokenizer(text_a) 
tokens_a = self._tokenizer.tokenize(text_a) #์ˆ˜์ •

์ง์ ‘ ์ฝ”๋“œ๋ฅผ ์ˆ˜์ •ํ•˜๋ฉด ์˜ค๋ฅ˜๊ฐ€ ์—†์„๊ฑฐ์—์š”

kibeomi commented 1 year ago

์„ค๋ช… ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค!

AbdirayimovS commented 1 year ago

I have the same issue! I used the python3.7 It solves some version problems. but still the gluonnlp is not installed properly :( In addition, I could not use tranformers library to download the KoBERT.

I installed it with transformers! Use python10 and the gluonnlp !=0.10.0

cwoonb commented 1 year ago

@AbdirayimovS Could you please share some library installation code?

kibeomi commented 1 year ago

data_train = BERTDataset(dataset_train, 0, 1, tokenizer,vocab, max_len, True, False) data_test = BERTDataset(dataset_test, 0, 1, tokenizer, vocab, max_len, True, False) ์ฝ”๋“œ๋ฅผ ์ด๋ ‡๊ฒŒ ๋ฐ”๊ฟ”๋ณด์„ธ์š”

-----Original Message----- From: "Yeongseo @.> To: @.>; Cc: @.>; @.>; Sent: 2023-07-31 (์›”) 18:26:25 (GMT+09:00) Subject: Re: [SKTBrain/KoBERT] [BUG] (Issue #104)

data_train = BERTDataset(dataset_train, 0, 1, tok , vocab, max_len, True, False) data_test = BERTDataset(dataset_test, 0, 1, tok , vocab, max_len, True, False) ์ด ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•˜๋ฉด ---> 80 tokens_a = self._tokenizer.tokenize(text_a) 81 tokens_b = None 82 AttributeError: 'function' object has no attribute 'tokenize' ์ด๋Ÿฐ ์—๋Ÿฌ๊ฐ€ ๋œน๋‹ˆ๋‹ค.. ์™œ๊ทธ๋Ÿฌ๋Š” ๊ฑฐ์ฃ ? โ€” Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

JWWPXX commented 10 months ago

I wonder if there are any contradictions in this installation dependencies

JWWPXX commented 10 months ago

when I ues pip install git+https://git@github.com/SKTBrain/KoBERT.git@master it shows ERROR: Cannot install kobert because these package versions have conflicting dependencies.

The conflict is caused by: onnxruntime 1.8.0 depends on numpy>=1.16.6 gluonnlp 0.6.0 depends on numpy mxnet 1.4.0.post0 depends on numpy<1.15.0 and >=1.8.2 onnxruntime 1.8.0 depends on numpy>=1.16.6 gluonnlp 0.6.0 depends on numpy mxnet 1.4.0 depends on numpy<1.15.0 and >=1.8.2

To fix this you could try to:

  1. loosen the range of package versions you've specified
  2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

plesae tell how to deal with it thank you!