2019 IP&M Query expansion techniques for information retrieval: A survey

ZahraTaherikhonakdar commented 3 years ago

Main problem: This paper covers recent progress in QE techniques and covers research on automatic, manual and interactive QE techniques.

Query Expansion Approaches: 1- Global analysis: QE techniques implicitly select expansion terms from hand-built knowledge resources or from large corpora for reformulating the initial query

Linguistic approaches: analyze the expansion features such as lexical, morphological semantic and syntactic term relationships, to reformulate or expand the initial query terms
- stemming analysis: reducing words to their root word
- semantic analysis: finding synonyms of words
- syntactic analysis: uses the enhanced relational features of the query terms for expanding the initial query. (such as term co-occurrence)
Corpus-based approaches: examine the contents of the whole text corpus to recognize the expansion features to be utilized for QE.
- term clustering: groups document terms into clusters based on their co-occurrences
- concept-based term: characterized each word by an embedded vector and analysis of the corpus using word embeddings, then select the expansion
Searched log-based approaches: analysis the search logs
- user query log:
- query documents relationships: the features are extracted on relational behavior of queries.
Web-based approaches: These approaches include Wikipedia and anchor texts from websites for expanding the user’s original query

2- Local analysis: QE techniques select expansion terms from the collection of documents retrieved in response to the user’s initial (unmodified) query

Relevance feedback: the user’s feedback about whether or not the retrieved documents are relevant to the user’s query is collected
Pseudo-relevance feedback: the feedback collection process is automated by directly using the top-ranked documents– retrieved in response to the initial query – for QE.

Previous Works and their Gaps: 1- reviewed ontology based QE techniques, which are domain specific. 2- reviewed the major QE techniques, data sources, and features in an IR system gap: covers only automatic query expansion (AQE) techniques and does not include recent research on personalized

social documents, term weighting and ranking methods, and categorization of several data sources annotation data3- solution: proposed a query suggestion algorithm that presents labeled query suggestion clusters so that the user can make comparisons across multiple entities (e.g. company names). Gap: a temporal point of view is not considered in these structured query suggestion methods.

Gap: Did not discuss the temporal QE an its challenges.

Results: For global analysis the corpus-based approaches are more effective than linguistic-based approaches. The reason is that linguistic-based approaches require a concrete linguistic relation (based on sense, meaning, concept etc.) between a query term and a relevant term for the latter to be discovered, while corpus-based approaches can discover the same relevant term simply based on co-occurrences with the query term. For local analysis: relevance feedback performed better than pseudo-relevance feedback. The primary reason behind this is that pseudo-relevance feedback depends on the execution of the user’s initial query; if the initial query is poorly formulated or ambiguous, then the expansion terms extracted from the retrieved documents may not be relevant.

ZahraTaherikhonakdar commented 2 years ago

@hosseinfani Please read

hosseinfani commented 2 years ago

@ZahraTaherikhonakdar This is a very good survey. Now, you understand the ReQue methods better, right? Also, you see how a survey comes up with categorization and comparisons.

fani-lab / ReQue

2019 IP&M Query expansion techniques for information retrieval: A survey #20