A Japanese DistilBERT pretrained model, which was trained on Wikipedia.
Find here for a quickstart guidance in Japanese.
DistilBERT is a small, fast, cheap and light Transformer model based on Bert architecture. It has 40% less parameters than BERT-base, runs 60% faster while preserving 97% of BERT's performance as measured on the GLUE language understanding benchmark.
This model was trained with the official Hugging Face implementation from here for 2 weeks on AWS p3dn.24xlarge instance.
More details about distillation can be found in following paper. "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter" by Sanh et al. (2019).
The teacher model is the pretrained Japanese BERT models from TOHOKU NLP LAB.
Currently only PyTorch compatible weights are available. Tensorflow checkpoints can be generated by following the official guide.
torch>=1.3.1
torchvision>=0.4.2
transformers>=2.5.0
tensorboard>=1.14.0
tensorboardX==1.8
scikit-learn>=0.21.0
mecab-python3
Please download and unzip DistilBERT-base-jp.zip.
# Read from local path
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-japanese-whole-word-masking")
model = AutoModel.from_pretrained("LOCAL_PATH")
LOCAL_PATH means the path which above file is unzipped. 3 files should be included:
or
# Download from model library from huggingface.co
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-japanese-whole-word-masking")
model = AutoModel.from_pretrained("bandainamco-mirai/distilbert-base-japanese")
Copyright (c) 2020 BANDAI NAMCO Research Inc.
Released under the MIT license