[BUG] - Githubissues

danaekdml commented 1 year ago

🐛 Bug

No module named 'kobert' ## To Reproduce

from kobert.utils import get_tokenizer from kobert.pytorch_kobert import get_pytorch_kobert_model

이거를 돌리려 할 때 ModuleNotFoundError Traceback (most recent call last) in <cell line: 2>() 1 #kobert ----> 2 from kobert.utils import get_tokenizer 3 from kobert.pytorch_kobert import get_pytorch_kobert_model

ModuleNotFoundError: No module named 'kobert' NOTE: If your import is failing due to a missing package, you can manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the "Open Examples" button below.

이러한 에러 발생 gluonnlp 버전 문제를 해결하니 kobert 문제 발생

버그를 재현하기 위한 재현절차를 작성해주세요.

-
-
-

Expected behavior

Environment

Additional context

hwangsaeyeon commented 1 year ago

저도 같은 오류가 발생하여 hugging face를 이용하는 방법으로 해결했습니다. hugging face 에 있는 코드를 활용하였고, BERTSentenceTransform에서 한번 더 오류가 발생하는데 이부분은 py파일에서 class를 복붙하여 코드를 수정하는 방법으로 해결했습니다.

blog 코드가 길어서 블로그에 작성했는데 참고하셔도 좋을 것 같습니다.

kibeomi commented 1 year ago

hugging face 와 blog 를 참고해서 다시 돌려보는데 정확도 수치가 0.17로 너무 낮게 나오네요. 뭐가 문제일까요? hugging face로 하기 전에는 정확도가 0.56정도 였는데 이상하네요

!pip install mxnet !pip install gluonnlp==0.8.0 !pip install tqdm pandas !pip install sentencepiece !pip install transformers !pip install torch !pip install 'git+https://github.com/SKTBrain/KoBERT.git#egg=kobert_tokenizer&subdirectory=kobert_hf'

from kobert_tokenizer import KoBERTTokenizer from transformers import BertModel

import torch from torch import nn import torch.nn.functional as F import torch.optim as optim from torch.utils.data import Dataset, DataLoader import gluonnlp as nlp import numpy as np from tqdm.notebook import tqdm from transformers import AdamW from transformers.optimization import get_cosine_schedule_with_warmup

tokenizer = KoBERTTokenizer.from_pretrained('skt/kobert-base-v1') bertmodel = BertModel.from_pretrained('skt/kobert-base-v1', return_dict=False) vocab = nlp.vocab.BERTVocab.from_sentencepiece(tokenizer.vocab_file, padding_token='[PAD]')

tok = tokenizer.tokenize 이 중에 잘못된 부분이 있을까요?

AbdirayimovS commented 1 year ago

I have the same issue! I used the python3.7 It solves some version problems. but still the gluonnlp is not installed properly :( In addition, I could not use tranformers library to download the KoBERT.

kibeomi commented 1 year ago

BERTSentenceTransform 클래스 선언할때, 19번째 줄 부분에

tokens_a = self._tokenizer(text_a) 대신 tokens_a = self._tokenizer.tokenize(text_a) 로 수정해야 모델이 제대로 돌아가는것 같습니다... [출처] No module named 'kobert' 에러 해결|작성자 yeon 이 댓글대로 하니 정확도가 다시 높아졌어요

siyeol97 commented 1 year ago

text_a = '한국어 모델을 공유합니다.'
tokens_1 = tokenizer.tokenize(text_a)
tokens_2 = tokenizer(text_a)
print(tokens_1, type(tokens_1))
print(tokens_2, type(tokens_2))

output : 
tokens_1 : ['▁한국', '어', '▁모델', '을', '▁공유', '합니다', '.'] <class 'list'>
tokens_2 : {'input_ids': [2, 4958, 6855, 2046, 7088, 1050, 7843, 54, 3], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1]} <class 'transformers.tokenization_utils_base.BatchEncoding'>

BERTSentenceTransform 클래스에서, tokens_a = self._tokenizer(text_a) 기존 코드대로 실행하면, tokens_a 는 위의 tokens_2 형식이 됩니다. 결국 token과 input_ids는

token = [[CLS], 'input_ids', 'token_type_ids', 'attention_mask', [SEP]]
input_ids = [2, 0, 0, 0, 3, 1, 1, 1, 1, ...]

input_ids 가 dataset 길이만큼 전부 같은 형식으로 바뀌어 정확도가 매우 낮아지는 겁니다.

그리고 만약

tok = tokenizer.tokenize
data_train = BERTDataset(dataset_train, 0, 1, tok , vocab, max_len, True, False)
data_test = BERTDataset(dataset_test, 0, 1, tok , vocab, max_len, True, False)

tok = tokenizer.tokenize 로 변환해 BERTSentenceTransform 클래스에 넣으면, 아마도 convert_tokens_to_ids를 불러올 수 없다는 오류가 나올겁니다. tokenizer.tokenize로 변환시켜서 넣지말고, BERTSentenceTransform 내부에서

#tokens_a = self._tokenizer(text_a) 
tokens_a = self._tokenizer.tokenize(text_a) #수정

직접 코드를 수정하면 오류가 없을거에요

kibeomi commented 1 year ago

설명 감사합니다!

AbdirayimovS commented 1 year ago

I have the same issue! I used the python3.7 It solves some version problems. but still the gluonnlp is not installed properly :( In addition, I could not use tranformers library to download the KoBERT.

I installed it with transformers! Use python10 and the gluonnlp !=0.10.0

cwoonb commented 1 year ago

@AbdirayimovS Could you please share some library installation code?

kibeomi commented 1 year ago

data_train = BERTDataset(dataset_train, 0, 1, tokenizer,vocab, max_len, True, False) data_test = BERTDataset(dataset_test, 0, 1, tokenizer, vocab, max_len, True, False) 코드를 이렇게 바꿔보세요

-----Original Message----- From: "Yeongseo @.> To: @.>; Cc: @.>; @.>; Sent: 2023-07-31 (월) 18:26:25 (GMT+09:00) Subject: Re: [SKTBrain/KoBERT] [BUG] (Issue #104)

data_train = BERTDataset(dataset_train, 0, 1, tok , vocab, max_len, True, False) data_test = BERTDataset(dataset_test, 0, 1, tok , vocab, max_len, True, False) 이 코드를 실행하면 ---> 80 tokens_a = self._tokenizer.tokenize(text_a) 81 tokens_b = None 82 AttributeError: 'function' object has no attribute 'tokenize' 이런 에러가 뜹니다.. 왜그러는 거죠? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

JWWPXX commented 10 months ago

I wonder if there are any contradictions in this installation dependencies

JWWPXX commented 10 months ago

when I ues pip install git+https://git@github.com/SKTBrain/KoBERT.git@master it shows ERROR: Cannot install kobert because these package versions have conflicting dependencies.

The conflict is caused by: onnxruntime 1.8.0 depends on numpy>=1.16.6 gluonnlp 0.6.0 depends on numpy mxnet 1.4.0.post0 depends on numpy<1.15.0 and >=1.8.2 onnxruntime 1.8.0 depends on numpy>=1.16.6 gluonnlp 0.6.0 depends on numpy mxnet 1.4.0 depends on numpy<1.15.0 and >=1.8.2

To fix this you could try to:

loosen the range of package versions you've specified
remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

plesae tell how to deal with it thank you!

SKTBrain / KoBERT

[BUG] #104

🐛 Bug

Expected behavior

Environment

Additional context