language: Japanese license: MIT

Japanese DistilBERT Pretrained Model

A Japanese DistilBERT pretrained model, which was trained on Wikipedia.

Find here for a quickstart guidance in Japanese.

Introduction
Requirements
Usage
License

Introduction

DistilBERT is a small, fast, cheap and light Transformer model based on Bert architecture. It has 40% less parameters than BERT-base, runs 60% faster while preserving 97% of BERT's performance as measured on the GLUE language understanding benchmark.

This model was trained with the official Hugging Face implementation from here for 2 weeks on AWS p3dn.24xlarge instance.

More details about distillation can be found in following paper. "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter" by Sanh et al. (2019).

The teacher model is the pretrained Japanese BERT models from TOHOKU NLP LAB.

Currently only PyTorch compatible weights are available. Tensorflow checkpoints can be generated by following the official guide.

Requirements

torch>=1.3.1
torchvision>=0.4.2
transformers>=2.5.0
tensorboard>=1.14.0
tensorboardX==1.8
scikit-learn>=0.21.0
mecab-python3

Usage

Download model

Please download and unzip DistilBERT-base-jp.zip.

Use model

# Read from local path
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-japanese-whole-word-masking")
model = AutoModel.from_pretrained("LOCAL_PATH")

LOCAL_PATH means the path which above file is unzipped. 3 files should be included:

pytorch_model.bin
config.json
vocal.txt

# Download from model library from huggingface.co
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-japanese-whole-word-masking")
model = AutoModel.from_pretrained("bandainamco-mirai/distilbert-base-japanese")

License

Released under the MIT license

https://opensource.org/licenses/mit-license.php

BandaiNamcoResearchInc / DistilBERT-base-jp

readme