AkihikoWatanabe commented 3 years ago

Overview

Contents-basedな手法でCTRを予測しNews推薦。newsのタイトルに含まれるentityをknowledge graphと紐づけて、情報をよりリッチにして活用する。 CNNでword-embeddingのみならず、entity embedding, contextual entity embedding（entityと関連するentity）をエンコードし、knowledge-awareなnewsのrepresentationを取得し予測する。 ※ contextual entityは、entityのknowledge graph上でのneighborhoodに存在するentityのこと（neighborhoodの情報を活用することでdistinguishableでよりリッチな情報を活用できる）

CNNのinputを[[word_ embedding], [entity embedding], [contextual entity embedding]](画像のRGB)のように、multi-channelで構成し3次元のフィルタでconvolutionすることで、word, entity, contextual entityを表現する空間は別に保ちながら（同じ空間で表現するのは適切ではない）、wordとentityのalignmentがとれた状態でのrepresentationを獲得する。

Experiments

BingNewsのサーバログデータを利用して評価。データは (timestamp, userid, news url, news title, click count (0=no click, 1=click))のレコードによって構成されている。 2016年11月16日〜2017年6月11日の間のデータからランダムサンプリングしtrainingデータセットとした。また、2017年6月12日〜2017年8月11日までのデータをtestデータセットとした。

word/entity embeddingの次元は100, フィルタのサイズは1,2,3,4とした。loss functionはlog lossを利用し、Adamで学習した。

DeepFM超えを達成。 entity embedding, contextual entity embeddingをablationすると、AUCは2ポイントほど現象するが、それでもDeepFMよりは高い性能を示している。また、attentionを抜くとAUCは1ポイントほど減少する。

1ユーザのtraining/testセットのサンプル

AkihikoWatanabe commented 3 years ago

365 によって経験的にRNN, Recursive Neural Network等と比較して、sentenceのrepresentationを獲得する際にCNNが優れていることが示されているため、CNNでrepresentationを獲得することにした模様（footprint 7より）

AkihikoWatanabe commented 3 years ago

Factorization Machinesベースドな手法（LibFM, DeepFM）を利用する際は、TF-IDF featureと、averaged entity embeddingによって構成し、それをuser newsとcandidate news同士でconcatしてFeatureとして入力した模様

AkihikoWatanabe commented 3 years ago

content情報を一切利用せず、ユーザのimplicit feedbackデータ（news click）のみを利用するDMF（Deep Matrix Factorization）の性能がかなり悪いのもおもしろい。やはりuser-item-implicit feedbackデータのみだけでなく、コンテンツの情報を利用した方が強い。

AkihikoWatanabe commented 3 years ago

（おそらく）著者によるtensor-flowでの実装: https://github.com/hwwang55/DKN

AkihikoWatanabe commented 3 years ago

日本語解説 https://qiita.com/agatan/items/24c6d8e00f2fc861bb04

AkihikoWatanabe / paper_notes

DKN: Deep Knowledge-Aware Network for News Recommendation, Wang+, WWW'18 #363

Overview

Experiments

365 によって経験的にRNN, Recursive Neural Network等と比較して、sentenceのrepresentationを獲得する際にCNNが優れていることが示されているため、CNNでrepresentationを獲得することにした模様（footprint 7より）