2013 SIGIR Time-aware structured query suggestion

ZahraTaherikhonakdar commented 2 years ago

Main problem: This paper organizes suggested queries along a timeline and let the user focus on a particular time range without specifying an explicit time limit. They introduce an algorithm, named Time-aware Structured Query Suggestion (TaSQS), to help the user access relevant web pages by presenting query suggestions with the timeline.

Input-Output: Phase 1: Generating Query Suggestions: Input : Query + URLs+ time period (in this case one day)--> constructed a graph where queries and URLs are nodes and the weight of each edge corresponds to the click count Output :list of query suggestions: the top L queries based on the number of steps it takes from a query to another on the graph. (the closest queries were chosen)

Phase 2: Time-aware Query Clustering: Input: A set of relevance score vectors R = {xt1 , xt2 , . . . , xtn} where n represents the number pf time period (days in this study) considered by TaSQS. Output: A cluster set C that contains no more than M clusters.( M = 5, as showing more than five clusters on a web search interface may not be realistic)

Phase 3: Time-aware Query Selection: Input: query clusters Output: Query suggestions (select query suggestions from the clusters for a presentation where the TA Score for a given query suggestion and a cluster is high when its average relevance score over that cluster is high while its average relevance score over the complement of that cluster is low)

Phase 4: Time-aware Document Ranking: Input: the user’s click on a suggested query Output: a ranked list of web pages

Previous Works and their Gaps: 1- solution: proposed an algorithm to cluster query suggestions based on click-through and session data. 2- solution: proposed a method to provide a label for each query suggestion cluster based on social annotation data3- solution: proposed a query suggestion algorithm that presents labeled query suggestion clusters so that the user can make comparisons across multiple entities (e.g. company names). Gap: a temporal point of view is not considered in these structured query suggestion methods.

Results: This work used The Microsoft Bing’s query log dataset IR performances by different methods this algorithm compares with below baselines and outperform all of them: POP: does not involve query clustering, but ranks retrieved documents based on the popularity (i.e. click count). nDCG: 0.624, RR: 0.809 GOOGLE: also does not involve query clustering, but simply uses the Google Custom Search API as the time constraint. nDCG: 0.652, RR: 0.809 EqualSplit: divides each month equally instead of applying time-aware clustering. nDCG: 0.709, RR: 0.809 TaSQS: nDCG: 0.780, RR: 0.936

Gap: personal information of the user is not considered in the suggested algorithm.

Code: Not available

hosseinfani commented 2 years ago

@ZahraTaherikhonakdar please complete this issue

ZahraTaherikhonakdar commented 2 years ago

@ZahraTaherikhonakdar please complete this issue

Dr. @hosseinfani

Done.

hosseinfani commented 2 years ago

@ZahraTaherikhonakdar What was the dataset?

ZahraTaherikhonakdar commented 2 years ago

@ZahraTaherikhonakdar What was the dataset?

The dataset was Microsoft Bing’s query log

hosseinfani commented 2 years ago

@ZahraTaherikhonakdar is it accessible?

ZahraTaherikhonakdar commented 2 years ago

@ZahraTaherikhonakdar is it accessible?

I searched the data set it is not available. The paper said, "This work was done while the author was at Microsoft Research Asia." and maybe that is the reason. They did not reference the dataset

fani-lab / ReQue

2013 SIGIR Time-aware structured query suggestion #15