Why did I choose this paper? Because clarification can be easily mapped to query expansion task.
Main problem:
The problem is to predict the user engagement level in clarification questions in search queries to find out when and how a clarification question should pop up in order to increase user satisfaction. A sample clarification question is when a user puts the query "how to set up a list in outlook" and then a clarification question pops up to ask "which version of outlook?". The question is, is it important to ask this question? Are the different versions of outlook different in setting up a list? Does it matter? This paper tries to address this problem by analyzing the user engagement to these clarification questions.
Existing work:
They looked into the literature by dividing their contribution into two areas:
conversational and web search clarification (multi-turn interaction)
Methods: Reinforcement learning and Transformers
Gaps: the necessity of asking clarification questions is an unexplored topic
engagement level prediction
How to estimate the engagement level:
self-reported questionnaires
facial expression
web analytics
User interactions with clarification pane
Gaps: lack of research work based on the retrieved documents (search results).
Inputs:
initial query q,
clarification question c,
list of candidate answers A,
retrieved results R
Outputs:
The user engagement level (y) in [0,10] for each input tuple.
Method:
Given inputs (q, c, A, R), their model called ELBERT outputs a joint representation of inputs utilizing ALBERT as the encoder. Then they do the regression on the output by adding two hidden layers to the end of the model.
Experiments:
Dataset: MIMICS: a large-scale collection of datasets for search clarification
Metrics: MSE (lower is better), MAE (lower is better), Coefficient of Determination (R2)(higher is better)
Baselines:
Mean, Median, Normal sampling of the engagement levels
Linear Regression: least squares
SVR: linear and RBF kernels
Random Forests
LSTM: bidirectional
Results:
Performance comparison
Comparisons are done on the full dataset and a portion of the dataset where the engagement level is more than zero. On the former dataset, the proposed method outperforms other baselines with MSE and R2 but the Median baseline outperforms with MAE. This is because a large portion have engagement level = 0 and the Median baseline just returns 0. However, the proposed method outperforms other baselines with all metrics on the second dataset.
Effect of SERP elements
An experiment is done to find out the effect of adding SERP elements as inputs to their proposed model and the result shows adding these elements always improves the model. Adding query and the title of result web pages is the best setting to achieve better performance of the model.
Why did I choose this paper? Because clarification can be easily mapped to query expansion task.
Main problem:
The problem is to predict the user engagement level in clarification questions in search queries to find out when and how a clarification question should pop up in order to increase user satisfaction. A sample clarification question is when a user puts the query "how to set up a list in outlook" and then a clarification question pops up to ask "which version of outlook?". The question is, is it important to ask this question? Are the different versions of outlook different in setting up a list? Does it matter? This paper tries to address this problem by analyzing the user engagement to these clarification questions.
Existing work:
They looked into the literature by dividing their contribution into two areas:
Inputs:
Outputs:
The user engagement level (y) in [0,10] for each input tuple.
Method:
Given inputs (q, c, A, R), their model called ELBERT outputs a joint representation of inputs utilizing ALBERT as the encoder. Then they do the regression on the output by adding two hidden layers to the end of the model.
Experiments:
Dataset: MIMICS: a large-scale collection of datasets for search clarification
Metrics: MSE (lower is better), MAE (lower is better), Coefficient of Determination (R2)(higher is better)
Baselines:
Results:
Performance comparison Comparisons are done on the full dataset and a portion of the dataset where the engagement level is more than zero. On the former dataset, the proposed method outperforms other baselines with MSE and R2 but the Median baseline outperforms with MAE. This is because a large portion have engagement level = 0 and the Median baseline just returns 0. However, the proposed method outperforms other baselines with all metrics on the second dataset.
Effect of SERP elements An experiment is done to find out the effect of adding SERP elements as inputs to their proposed model and the result shows adding these elements always improves the model. Adding query and the title of result web pages is the best setting to achieve better performance of the model.
Code:
The code of this paper is available here
Presentation:
There is no available presentation for this paper.