Summary of SemEval 14,15,16 Aspect-based Sentiment Analysis task

The following report provides a summary and comparison of three SemEval datasets—2014 (Task 4), 2015 (Task 12), and 2016 (Task 5)—that have been used in LADy. These tasks focus on aspect-based sentiment analysis. In addition to detecting sentiment, it is essential to identify the entity and the feature to which the sentiment is directed. This report details the various versions of these tasks and their respective characteristics.

SemEval 14 Task 4:

Datasets' domain : Laptop, Restaurant

Subtasks descriptions: (4 subtasks)

1. SB1

Datasets: Restaurants, Laptops.
Description: The task is to identify all the aspect term in each sentence of a review. (Even for the ones with no sentiment expressed toward it.) This task will help on understanding the frequently mentioned aspects.

2. SB2

Datasets: Restaurants, Laptops
Description: In this subtask it is assumed that aspect term is given, and the task is to identify the polarity of the mentioned aspect.
The tags are Positive, Negative, Neutral, Conflict. (Conflict is for situations where both negative and positive aspects are mentioned in a review. e.g. "Certainly not the best sushi in New York, however, it is always fresh".)

3. SB3

Datasets: Restaurants
Description : There is a set of predefined aspect-category and a set of review sentences (that do not have the aspect term and polarity) The aim is to identify the aspect category being discussed in each sentence. (In this task the aspects and categories are inferred form the sentences such as Delicious but expensive => FOOD#PRICE)

4. SB4

Datasets: Restaurants
Description: In this task the aspect category is provided and the goal is to determine the polarity of that.

Tasks summary

SB1, 2 is beneficial when there is no predefined inventory of aspects available. It can be useful to have a set of the most frequently mentioned aspects and their corresponding sentiments.
SB3, 4 can be useful when there is a predefined inventory of aspect-categories, and there is a need to summarize the reviews based on the majority.

Data Collection

Restaurant: 3,041 English sentences, a subsection of Ganu et al. (2009) work.
- The work includes aspect-categories as mentioned in SB3 and overall polarity. Additional tags were added later. Additional restaurant reviews were gathered and annotated from scratch for the test datasets.
Laptop: Contains 3,845 reviews in total (3,045 train + 800 test).
- It was tagged by human annotators for SB1,2.

Data Annotation The annotators used BART, a web-based annotation tool configured for the task. Using BART, they provided the aspect term (SB1), aspect term polarity (SB2), aspect category (SB3), and aspect category polarity (SB4).

Stage 1: Aspect Terms and Polarity Annotators tagged the explicit aspects mentioned in the sentence and determined their polarity. Example: "I hated their fajitas, but their salads were great" → {‘fajitas’: negative, ‘salads’: positive}.
Stage 2: Aspect Category and Polarity In this stage, annotators assigned a predefined set of aspect categories to the sentences and determined their polarity. Example: "The restaurant was expensive, but the menu was great" → {PRICE: negative, FOOD: positive}.

Format : XML Metric to measure the results F1 and accuracy.

SemEval 15 Task 12

Dataset Categories: Laptops, Restaurants, Hotels. In contrast to SemEval 14, aspect terms correspond to explicit mentions of entities or attributes. The dataset consists of entire reviews, not isolated sentences.

Subtasks descriptions:

Subtask 1: Given a review about a laptop or a restaurant, provide the following pairs in each slot:
- Slot 1: Aspect Category: Identify all the entity-attribute pairs (E#A) toward which an opinion has been expressed in the review (predefined E#A pairs).
- Slot 2: Opinion Target Expression (OTE): Find the linguistic expression related to the E#A pair. If none exists, mark it as "Null."
- Slot 3: Sentiment Polarity: For each E#A pair, assign one of the following polarities: positive, negative, or neutral. Example: "The food was delicious but do not come here on an empty stomach." → {category= “FOOD#QUALITY”, target= “food”, from: “4”, to: “8”, polarity= “positive”}
Subtask 2: Out-of-Domain ABSA
- Participants tested their system on an unseen domain (hotel reviews) with no training data.

Data Collection:

Laptops
- Train: [review texts: 277, sentences: 1,739],
- Test: [review texts: 173, sentences: 760]
Restaurants
- Train: [review texts: 254, sentences: 1,315],
- Test: [review texts: 96, sentences: 685]

Data Annotation Methods: Similar to SemEval 14, with the following differences:

There is no OTE in the laptop section, as features are expressed through a limited number of expressions.
The "conflict" tag has been removed.
The "neutral" tag does not indicate objectivity.

Format: XML Metrics to Measure Results: F1

SemEval 16 Task 12

Dataset Categories: Restaurants, laptops, hotels, mobile phones, digital cameras, telecommunications. In the third year of this task’s evolution, 19 training datasets and 20 testing datasets have been added across 8 languages (Arabic, Chinese, Dutch, French, Russian, Spanish, and Turkish) and 7 domains. Of these datasets, 25 are designed for sentence-level sentiment analysis, and 14 for text-level Aspect-Based Sentiment Analysis (ABSA).

Subtasks Description

Subtask 1: Sentence-Level ABSA. Given a sentence containing an opinion about a targeted entity, the goal is to identify opinion tuples with the following types of information:

Slot 1: Aspect Category – Identifying the pair of entity (E) and attribute (A) toward which a sentiment is expressed.
Slot 2: Opinion Target Expression (OTE) – Extracting the linguistic term and its indexes in the sentence that point to the entity-attribute term. If no explicit mention is made, mark it as NULL. Slot 2 identification is required only in the restaurants, hotels, museums, and telecommunications (tweet) domains.
Slot 3: Sentiment Polarity – Assign a sentiment label (positive, negative, or neutral) to each identified E#A pair. Example: “Their sake list was extensive, but we were looking for Purple Haze, which wasn’t listed but made for us upon request!” → {cat: “drinks#style_options”, trg: “sake list”, fr: “6”, to: “15”, pol: “positive”}, {cat: “service#general”, trg: “NULL”, fr: “0”, to: “0”, pol: “positive”}

Subtask 2: Text-Level ABSA Given a review, the goal is to identify a set of categories and their corresponding polarities for summarizing the expressed opinion. Example: “The so-called laptop runs too slow and I hate it! Do not buy it! It is the worst laptop ever.” → {cat: “laptop#general”, pol: “negative”}, {cat: “laptop#operation_performance”, pol: “negative”}
Subtask 3: Out-of-Domain ABSA Participants can test their models on domains where no prior information is available.

Dataset Overview

Total: 70,790 entries for training and testing.
Sentence-Level Annotations (SB1): 47,654 entries in 8 languages across 7 domains.
Text-Level Annotations (SB2/OTE): 23,136 entries in 6 languages across 3 domains.

Data Annotation Methods The annotated data in each language was prepared by native researchers in the field.

fani-lab / LADy