fani-lab / LADy

LADy 💃: A Benchmark Toolkit for Latent Aspect Detection Enriched with Backtranslation Augmentation
Other
5 stars 6 forks source link

2022, NAACL, Generative Cross-Domain Data Augmentation for Aspect and Opinion Co-Extraction #88

Open Sepideh-Ahmadian opened 2 months ago

Sepideh-Ahmadian commented 2 months ago

Paper Generative Cross-Domain Data Augmentation for Aspect and Opinion Co-Extraction

Introduction There is a lack of fine-grained data in some domains of opinion analysis, restricting the development of supervised models in those domains. In recent efforts, researchers suggest models to map labeled data (knowledge) in one domain to other domains.

Main Problem To alleviate the mentioned problem the unsupervised domain adaptation methods have been proposed to produce more targeted compatible comments.

Illustrative Example Given sentences: I like the spicy tuna roll Output: lightweight and long battery life

Input A sentence in one domain.

Output A sentence in the other domain.

Related works and their gaps Rule-based adaptation (Li et al., 2012; Ding et al., 2017): hard to design high-quality manual rules and opinion set  Feature-based adaptation (Wang and Pan, 2018; Li et al., 2019; Pereg et al., 2020; Chen and Qian, 2021): the main task is trained by source labeled data, which fails to capture the important information in the target Data augmentation-based adaptation: (Yu et al., 2021) The quality and diversity of generated data are limited since they capture the source's domain reviews.

Contribution of this paper They proposed a Generative Cross-Domain Data Augmentation framework for unsupervised domain adaptation. Their suggested method shows promising results which leads to generating more fluent and diversified reviews in comparison to previous domains.

Proposed methods Not included

Experiments Model pre-trained sequence to sequence BART Datasets Restaurant and  Laptop datasets from SemEval 2014 and 2015 (Pontiki et al., 2014, 2015) and Device consists of reviews from digital devices collected by (Hu and Liu, 2004).

Implementation https://github.com/NUSTM/GCDDA

Gaps this work The proposed method only considers a single source domain. I believe it would be more effective to include multiple domains to develop more comprehensive patterns for data generation in the target domain. Additionally, the method is limited to the English language. Expanding it to other languages would increase its applicability and effectiveness.