weekly useful materials - 04/13 -

Transformerモデルの高速化

Tranformerの推論速度がOpenVINO(CPU)とTensorRT(GPU)を用いてどれだけ高速化を検証した記事。

いずれの形式でもonnxにconvertしたモデルを変換することで推論が可能。

OpenVINO形式はonnxのcpu推論より遅い結果になったが、TensorRTはonnxのgpu推論よりも高速化したとのこと。

スクリーンショット 2022-04-09 19 13 06

スクリーンショット 2022-04-09 19 13 17

メモ

出典

Transformerモデルの高速化

機械学習モデルを軽量化せよ！Tensorflow Liteのモデル最適化について

Tensorflow Liteを用いて実施できる量子化や枝刈りを行うことで、精度とモデルサイズがどのように変化するかを検証した記事

float32をint8にしても意外に精度が下がらないという結果を得ている。

スクリーンショット 2022-04-09 19 27 16

メモ

出典

機械学習モデルを軽量化せよ！Tensorflow Liteのモデル最適化について

informatix-inc / bert

informatix社が日本語コーパスで事前学習を行なったrobertaがapach-2.0ライセンスにて公開。

his repository provides snippets to use RoBERTa pre-trained on Japanese corpus. Our dataset consists of Japanese Wikipedia and web-scrolled articles, 25GB in total.

We trained our model in the same way as RoBERTa. We optimized our model using the masked language modeling (MLM) objective. The accuracy of the MLM is 72.0%.

使い方に若干の癖がありそう

# from https://github.com/informatix-inc/bert

    # Paths to each file
    bpe_file = <Path to the file (bpe.txt) which defined word pieces>
    count_file = <Path to the file (bpe_dict.csv) which defines ids for word pieces>
    roberta_config_path = <Path to the file (roberta_config.json) which defines configurations of RobertaModel>
    juman_config_path = <Path to config file for juman>

    roberta_weight_path = <Path to the weight file (roberta.pth) of RobertaModel>
    linear_weight_path = <Path to the weight file (linear_word.pth) of final linear layer for MLM>

    # load tokenizer
    processor = TextProcessor(bpe_file=bpe_file, count_file=count_file)

    # load pretrained roberta model
    with open(roberta_config_path, "r") as f:
        config_dict = json.load(f)
    config_bert = RobertaConfig().from_dict(config_dict)
    roberta = RobertaModel(config=config_roberta)
    roberta.load_state_dict(torch.load(roberta_weight_path, map_location=device))

    # load pretained decoder
    ifxroberta = IFXRoberta(roberta)
    ifxroberta.linear_word.load_state_dict(torch.load(linear_weight_path, map_location=device))

メモ

出典

informatix-inc / bert

Announcing AWS Lambda Function URLs: Built-in HTTPS Endpoints for Single-Function Microservices

AWS LambdaがAPI GatewayなしでそのままAPIとして叩けるようになったとのこと

Today, I’m happy to announce the general availability of Lambda Function URLs, a new feature that lets you add HTTPS endpoints to any Lambda function and optionally configure Cross-Origin Resource Sharing (CORS) headers.

メモ

出典

Announcing AWS Lambda Function URLs: Built-in HTTPS Endpoints for Single-Function Microservices

バンダイナムコグループで対照学習による埋め込み表現を用いてレコメンド技術検証をした話

アニメのシリーズ　関係を用いた対象学習により、BERTをfinetuningすることで、レコメンド性能が高められないかを検証した記事。

スクリーンショット 2022-04-11 21 39 55

上記のようなfine-tuningをすることで、fine-tuninigなしと比べて、より良いレコメンドが行えることを示唆している。

スクリーンショット 2022-04-11 21 40 13

スクリーンショット 2022-04-11 21 40 20

スクリーンショット 2022-04-11 21 41 28

アニメだとどうしても画風や制作会社というものが影響することもありそうなので、サムネイル画像とかも一緒にembeddingするとより良くなるかも？

出典

バンダイナムコグループで対照学習による埋め込み表現を用いてレコメンド技術検証をした話

LAION-5B: A NEW ERA OF OPEN LARGE-SCALE MULTI-MODAL DATASETS

LAION-400Mが14倍の5 billion枚のデータセットになって進化した。

スクリーンショット 2022-04-11 22 01 23

商用利用可能なCreative Common CC-BY 4.0で公開されているものの、商用利用は勧めていない。

we however do not recommend using it for creating ready-to-go industrial products, as the basic research about general properties and safety of such large-scale models, which we would like to encourage with this release, is still in progress.

LAION-400Mでは512*512以下の小さめの画像ばかりだったのだが、今回はそれを超えるFullHDサイズの画像が増えている。

スクリーンショット 2022-04-11 21 55 09

common crawlで収集した画像に対して以下のようなフィルタリングをすることで、データセットを構築している。

After downloading the WAT files from Common Crawl, we apply the following filtering conditions:

All samples with less than 5 characters alt-text length or less than 5 KB image size are dropped.

All images with the too big resolution, potentially DOS bombs, were dropped before attempting to process them.

Duplicate removal is performed with a bloom filter based on URL. Future runs would include more variate deduplication rules, such as URL + language for the multilanguage dataset.

We use CLIP respectively MCLIP to compute embeddings of the image and alt-text. Then we compute the cosine similarity of both embeddings and drop all samples with cosine similarity below 0.28 for the English language ( with CLIP B/32) and 0.26 for the multilingual dataset (MCLIP). These thresholds were selected based on human inspection of the test results.

We use the CLIP embeddings of images and texts to filter out to the possible extent the illegal content.

まさか一年も経たぬ間に、データセットが14梅になるとは思わなかった...

デモのサイトではテキストで画像を検索できるようになったので、各プロジェクトで必要な画像をピンポイントで探すこともできるかもしれない。

出典

自作MLプロジェクトでMLOps界隈の技術を試してみるその1

GCPを使い倒して、モデル学習用のデータ作成ETLからサービング、モニタリングまでを行っている。

スクリーンショット 2022-04-11 22 59 46

参考になりそう

出典

自作MLプロジェクトでMLOps界隈の技術を試してみるその1

最適輸送入門

最適輸送距離を導入するメリットや、実際の解き方などがまとめられている資料

スクリーンショット 2022-04-11 23 23 34

スクリーンショット 2022-04-11 23 25 07

スクリーンショット 2022-04-11 23 25 15

スクリーンショット 2022-04-11 23 25 26

スクリーンショット 2022-04-11 23 25 43

スクリーンショット 2022-04-11 23 25 55

これまで謎に包まれていたシンクホーンアルゴリズムがなんとなくわかったので、よかった。

出典

最適輸送入門

PytorchのDataLoaderの高速化のコツについてすこし解説

pytorchのDataLoaderで何が行われているかの解説から始まり、高速化のためのtipsが紹介されている記事。

高速化についてはこの記事以上のことは出てきていないが、目を見張るのは仕組み部分。

スクリーンショット 2022-04-12 0 00 06

DataLoaderは、iterでアクセスされたタイミング(＿MultiPorcessingDataLoaderIterが生成されるタイミング）で、ワーカーやデータ通信のためのキューを用意します。構成は下記の図のようになっていて、ワーカーごとに入力キュー、ロード済みのデータを読む共通のキュー、ワーカーで構成されています。ワーカー達が準備されたタイミングで、ワーカーのキューに予めnum_workers * prefetch_factor個の処理を突っ込むという処理になっています。ワーカーへの割当はラウンドロビンとなっています。

スクリーンショット 2022-04-12 0 00 22

＿MultiPorcessingDataLoaderIterにnextメソッドでアクセスしてデータを取り出す際は、インデックスのワーカーのデータが到着するまで待ち、そのデータが得られたらデータを返すという処理になっています。samplerが生成したインデックスの順番を保証するようになっていて、次のインデックスよりも後のインデックスが先に得られた場合は、順番が先のインデックスが得られるまで待ちます。また、データを待つ時間にその際にタイムアウトを設定することもできますが、使うことはあまり無いと思います。

samplerが生成したindex順になるように待つというところが何かしらのボトルネックになるかもしれない。

メモ

出典

PytorchのDataLoaderの高速化のコツについてすこし解説

GENZITSU / UsefulMaterials