chullhwan-song / Reading-Paper

151 stars 26 forks source link

TF-Ranking: Scalable TensorFlow Library for Learning-to-Rank #163

Open chullhwan-song opened 5 years ago

chullhwan-song commented 5 years ago

https://arxiv.org/abs/1812.00073 https://github.com/tensorflow/ranking

chullhwan-song commented 4 years ago

LEARNING-TO-RANK

Setup

Training Data

Utility and Ranking Metrics

Loss Functions

Item Weighting

PLATFORM OVERVIEW

image

COMPONENTS

(1) data reader (2) transform function (3) scoring function (4) ranking loss functions (5) evaluation metrics (6) ranking head (7) a model_fn builder

Reading data using input_fn

the following set of pairwise constraints is generated (examples are referred to by the info-string after the # character):

1A>1B, 1A>1C, 1A>1D, 1B>1C, 1B>1D, 2B>2A, 2B>2C, 2B>2D, 3C>3A, 3C>3B, 3C>3D, 3B>3A, 3B>3D, 3A>



* 2th: query 정보
   * 한개의 query에 여러개가 있는 것을 알수 있는데, 그것은 라벨을 보고 랭킹정보를 알수 있고, 위에서 설명한거와 같이, 한문서만 표현한 케이스
* 3th~: feature
   * index:value인데 zero이면 제거할수 있다.

### Feature Transformation with transform_fn
* Fig1. transform_fn
* spare feature(word or ngram 형태를 의미) > dense feature(w2v같은 embedding feature) 로 전환
* dense 2-D : context  
* 3-D tensors : per item features
![image](https://user-images.githubusercontent.com/40360823/60855833-27273b80-a240-11e9-9067-28db31472624.png)

### Feature Interactions using scoring_fn
* 실제 network 부분
* 여기서는 3-layer feedforward neural network with ReLUs 예제.
![image](https://user-images.githubusercontent.com/40360823/60933668-45537100-a2fe-11e9-9b48-1a90992519ee.png)

### Ranking Losses
* The loss key is an **enum** over supported loss functions
![image](https://user-images.githubusercontent.com/40360823/60935605-64a1cc80-a305-11e9-9c24-397f22ea9fba.png)

### Ranking Metrics
* evaluation - NDCG 예
![image](https://user-images.githubusercontent.com/40360823/60942380-64153000-a31d-11e9-8c94-ac9a66e44b21.png)

### Ranking Head
* 앞서 설명한 losses & metrics에 대한 wrapper같음.
![image](https://user-images.githubusercontent.com/40360823/60942456-a3dc1780-a31d-11e9-9dcc-1a7144c7f590.png)

### Model Builder
* main정도로...
![image](https://user-images.githubusercontent.com/40360823/60942487-c5d59a00-a31d-11e9-9dfa-1c78389a8d48.png)

## USE CASES
* 현재 tf-ranking가 적용된 구글 서비스
   * Gmail search 
   * Google Drive안에서의 document recommendation 
* 이들 서비스는 엄청난 click log 데이터를 기반으로 학습
* RankLib보다 좋다.
* 게다가 Gmail 서비스에서는, 원래 잘 할수 없는 "sparse textual features"를 잘 적용되도록 모델을 만듦.
   * sparse textual feature > 는 개인적인 유추해보는데...매우 dictionary가 크거나, 희박한...단어..

### Gmail Search
* Gmail는 검색로그를 기반으로 학습. > clicks
* 익명으로 선택..
* dense 와 sparse features 두개를 구성
    * dense
    * sparse :  word- and character-level n-grams
* 250M queries
*  Losses & metrics 는 "weighted by Inverse Propensity Weighting"

### Document Recommendation in Drive
*  user click data