cl-tohoku bert-japanese issues

cl-tohoku / bert-japanese

BERT models for Japanese text.

Apache License 2.0

514 stars 55 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Bump transformers from 4.30.0 to 4.36.0

#38 dependabot[bot] opened 11 months ago
0
Could not download jawiki-20230102

#37 zhutixiaojie0120 closed 1 year ago
4
Past pretrained model license change from CC-BY-SA 3.0 to Apache 2.0

#36 icoxfog417 closed 1 year ago
2
Bump transformers from 4.26.0 to 4.30.0

#35 dependabot[bot] closed 1 year ago
0
Bump tensorflow from 2.11.0 to 2.11.1

#34 dependabot[bot] closed 1 year ago
0
How did you translate tensorflow2 pretrained model to pytorch model?

#33 nagailong opened 1 year ago
2
strange tokenizer results with self-pretrained model

#32 lightercs opened 2 years ago
12
Is tokenization.py needs to be uploaded to GCP?

#31 lightercs closed 2 years ago
2
'Can't convert ['test.txt'] to Trainer' when training a BertWordPieceTokenizer

#30 suchunxie opened 2 years ago
2
SSL error

#29 leoxu1007 opened 2 years ago
0
[Question] About the Char model

#28 AprilSongRits closed 3 years ago
2
Please tell us how to quote your model for paper

#27 nakamolinto closed 3 years ago
2
The results seems different from hugging face...

#26 leoxu1007 opened 3 years ago
0
Error when initializing from the transformers pipeline

#25 EtienneGagnon1 opened 3 years ago
7
AutoTokenizer.from_pretrained doesn't work on newer models

#24 KoichiYasuoka closed 3 years ago
3
How to add new vocabulary to vocab.txt

#23 kaoruoshita opened 3 years ago
2
Bump tensorflow from 2.3.0 to 2.4.0

#22 dependabot[bot] closed 3 years ago
1
What is the size (GB) of the pretraining corpus?

#21 ciwang closed 3 years ago
3
[Question] How to mask token

#20 cidrugHug8 opened 4 years ago
0
Getting some weights not used warning

#19 wailoktam opened 4 years ago
1
BertJapaneseTokenizer can find 'cl-tohoku/bert-base-japanese-whole-word-masking' but BertModel cannot ('cl-tohoku/bert-base-japanese-whole-word-masking')

#18 wailoktam closed 3 years ago
3
AttributeError: 'MecabBertTokenizer' object has no attribute 'vocab'

#17 shuxinjin opened 4 years ago
6
About Pre-Training times

#16 sezai-rdc closed 3 years ago
5
Cannot run the example masked_lm_example.ipynb

#15 dangne opened 4 years ago
1
Swap Mecab tokenizer with Sentencepiece : possible ?

#14 sachaarbonel closed 3 years ago
2
Help on using the model for finetuning

#13 wailoktam closed 3 years ago
3
Can you detail the preprocessing needed to be done with text during finetuning when using your pretrained model?

#12 wailoktam opened 4 years ago
0
Will tokenizer remove stopwords?

#11 HeroadZ closed 4 years ago
4
Get the last output of the model 'cl-tohoku/bert-base-japanese-char-whole-word-masking'

#10 demdecuong closed 4 years ago
4
Unable to find .ckpt. file

#9 nishithbenhur opened 4 years ago
6
Fine tune

#8 nuwanq closed 4 years ago
9
License for models

#7 MobiusLooper closed 4 years ago
2
Is it unnecessary to use neologd in segmentation?

#6 freshell closed 4 years ago
2
How to use my own pretrained model?

#5 YosukeHiguchi closed 4 years ago
2
Add new vocabs

#4 nuwanq closed 4 years ago
6
transoformers' japanese vocab don't have "ゑ", but have "ヱ"

#3 knok closed 4 years ago
2
Fix link to files

#2 reiyw closed 4 years ago
0
tensorflow version

#1 xzdong-2019 closed 5 years ago
2