fani-lab / LADy

LADy 💃: A Benchmark Toolkit for Latent Aspect Detection Enriched with Backtranslation Augmentation
Other
3 stars 4 forks source link

Dataset for LADy (Story Board) #72

Open hosseinfani opened 1 month ago

hosseinfani commented 1 month ago

Literature Review:

-- Sigir -- Ecir -- Cikm -- …

-- SemEval -- Amazon -- …

Our Dataset:

-- To show the grammar errors, we use llms or grammar check tools to show the % of errors -- Show methods of aspect detection relying on the syntax tree

-- double-check no errors with the same method above (llms or grammar check tools)

-- The human annotator should label the latent aspect(s), Double check through crowdsourcing to students, Voting on what’s the latent aspect(s)

Review Domains

-- Electronics, -- Clothes, -- …

Dataset File Structure

Datasheet (data card) for datasets

Licensing

hosseinfani commented 1 month ago

Hi @Parmidakhashayar, I added the content of the gdoc here to encapsulate everything in github. You can log your progress here or create new issues and linked them here.

Parmidakhashayar commented 1 month ago

Thank you @hosseinfani, I will add my comments/ progress here.

Parmidakhashayar commented 1 month ago

Here are my findings regarding exploring semeval workshop tasks related to reviews & aspects, I studied 3 tasks, SemEval 2014 Task 4: Aspect Based Sentiment Analysis, SemEval 2015 Task 12: Aspect Based Sentiment Analysis and SemEval 2016 Task 5: Aspect Based Sentiment Analysis.

  1. SemEval 2014 Task 4: Aspect Based Sentiment Analysis

identifying aspects of target entities and determining the sentiment expressed towards each aspect.

Subtask 1: Aspect term extraction An aspect term names a particular aspect of the target entity (e.g., "I liked the service and the staff, but not the food”, “The food was nothing much, but I loved the staff”).

Subtask 2: Aspect term polarity “I loved their fajitas” → {fajitas: positive} “I hated their fajitas, but their salads were great” → {fajitas: negative, salads: positive}

Subtask 3: Aspect category detection “The restaurant was too expensive” → {price}

Subtask 4: Aspect category polarity “The restaurant was expensive, but the menu was great” → {price:negative, food: positive}

Datasets: Two domain-specific datasets for laptops and restaurants, consisting of over 6K sentences with fine-grained aspect-level human annotations have been provided for training.

ACL Link: https://www.aclweb.org/portal/content/semeval-2014-task-4-aspect-based-sentiment-analysis

  1. SemEval 2015 Task 12: Aspect Based Sentiment Analysis

Building on the 2014 task, analyzing entire reviews rather than isolated sentences. focus primarily on the same domains as ABSA14 (restaurants and laptops). However, unlike ABSA14, the input datasets of ABSA15 will contain entire reviews, not isolated.

Subtask 1: In-domain ABSA Slot 1: Identify every entity (E) and attribute (A) pair (E#A) It is extremely portable and easily connects to WIFI at the library and elsewhere. →{LAPTOP#PORTABILITY}, {LAPTOP#CONNECTIVITY}

Slot 2: Opinion Target Expression (OTE), the reviewed entity E of a pair E#A. Takes the value “NULL”, when there is no (explicit) mention of the entity E.

Great for a romantic evening, but over-priced. → {AMBIENCE#GENERAL, “NULL”}, {RESTAURANT# PRICES, “NULL”} The fajitas were delicious, but expensive. → {FOOD#QUALITY, “fajitas”}, {FOOD# PRICES, “fajitas”}

Slot 3: Sentiment Polarity: Each identified E#A pair of the given text has to be assigned a polarity (positive, negative, or neutral)

The applications are also very easy to find and maneuver. → {SOFTWARE#USABILITY, positive} The fajitas are nothing out of the ordinary”. → {FOOD#GENERAL, “fajitas”, neutral}

Subtask 2: Out-of-domain ABSA Participants were asked to test their system on unseen domain.

DATASETS: Two datasets of ~550 reviews of laptops and restaurants

ACL Link: https://www.aclweb.org/portal/content/semeval-2015-task-12-aspect-based-sentiment-analysis

  1. SemEval 2016 Task 5: Aspect Based Sentiment Analysis

sentence-level ABSA, text-level ABSA, and an out-of-domain ABSA subtask, testing datasets for several domains in 8 languages.

Subtask 1: Sentence-Level ABSA Slot 1: Aspect Category: Identify every entity (E) and attribute (A) pair (E#A). It is extremely portable and easily connects to WIFI at the library and elsewhere. →{LAPTOP#PORTABILITY}, {LAPTOP#CONNECTIVITY}

Slot 2: Opinion Target Expression (OTE) Slot 3: Sentiment Polarity

Subtask 2: Text-Level ABSA

Identify a set of {aspect, polarity} tuples that summarize the opinions expressed in each review.

Subtask 3: Out-of-domain ABSA Opportunity to test their systems in a previously unseen domain for which no training data will be made available.

DATASETS: • Restaurants (Customer Reviews): English, Dutch, French, Russian, Spanish, Turkish • Hotels (Customer Reviews): English, Arabic • Consumer Electronics (Customer Reviews): o Laptops: English o Mobile Phones: Chinese, Dutch o Digital Cameras: Chinese • Telecommunications (Twitter): Turkish

ACL Link: https://www.aclweb.org/portal/content/call-participation-semeval-2016-task-5-aspect-based-sentiment-analysis

hosseinfani commented 1 month ago

Hi @Parmidakhashayar Very nice summary of semeval tasks. Thank you. would appreciate it if you could give a brief presentation on our next weekly meeting.

Parmidakhashayar commented 1 month ago

Hello @hosseinfani,

I was working on the "Find reviews with no aspect annotations in our existing datasets" task. I was able to run the attached python file and get results from this xml file . https://github.com/fani-lab/LADy/blob/main/data/raw/semeval/2015SB12/ABSA15_RestaurantsTrain/ABSA-15_Restaurants_Train_Final.xml

Can you please go over some of the sentences in the result file and let me know if this is what we are looking for ?

Thanks ,

results.docx https://github.com/Parmidakhashayar/LADy-unofficial/blob/main/NoAspect.py

hosseinfani commented 1 month ago

Hi @Parmidakhashayar Thank you for the code and results. Few notes on the coding part:

Let me know on Tuesday so I can explain more on them.

Parmidakhashayar commented 1 month ago

Hello @hosseinfani,

Thank you for reviewing the code and your feedback. I will be in lab at 5 today (Tuesday), I would appreciate your hands-on explanation. I can also stop by your lab if that's more convenient.

Parmidakhashayar commented 1 month ago

Hello @hosseinfani, Both the semeval.py and review.py runs on my local computer now. Although, the program doesn't return anything which I'm not sure if that's the desired out put of the code.

hosseinfani commented 1 month ago

@Parmidakhashayar

Thank you.

Parmidakhashayar commented 1 month ago

@hosseinfani Yes, The importation is with no error. I will work on using the _xmlloader() function as well

Parmidakhashayar commented 3 weeks ago

Hello @hosseinfani,

I would like to update you about the returning values with no aspects process. I was able to make a virtual environment on PyCharm and move my codes/ packages there. Although, PyCharm doesn't recognize the numpy and panda package.

I was able to debug my main.py through command line , it generates the out put but it's empty ! I believe there might be a problem either with getting the input or the logic .

I plan on fixing the problem with PyCharm first as it would be a lot easier to debug and update the logic after .

I would appreciate your insight on how I can fix the problem with PyCharm

Parmidakhashayar commented 3 weeks ago

Here is the error I keep getting on PyCharm , Traceback (most recent call last): File "/Users/parmidakhashayar/PycharmProjects/pythonProject1/main.py", line 1, in import pandas as pd File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pandas/init.py", line 19, in raise ImportError( ImportError: Unable to import required dependencies: numpy: Error importing numpy: you should not try to import numpy from its source directory; please exit the numpy source tree, and relaunch your python interpreter from there.

Parmidakhashayar commented 3 weeks ago

@Lillliant and I had a meeting yesterday about the process. As the output was not what we were looking for, we went over the logic and found key parts to change in the code. The previous error I mentioned seems to be a problem with the Mac architecture, and I need to address it.

hosseinfani commented 3 weeks ago

Hi @Parmidakhashayar I don't think the error is related to computer architecture. Let me know when you're in lab and we'll go through it.

Parmidakhashayar commented 3 weeks ago

Hello @hosseinfani , I will be in lab on Thursday, thank you =)

Parmidakhashayar commented 3 weeks ago

Hello @hosseinfani, I will be in the lab today (Thursday) at 5 , would you be available at that time to discuss the error I mentioned earlier?

hosseinfani commented 2 weeks ago

Hi @Parmidakhashayar I cannot be on campus today but we can work remotely at 5pm

Parmidakhashayar commented 2 weeks ago

@hosseinfani

Sure, thank you .

Parmidakhashayar commented 2 weeks ago

@hosseinfani Please let me know when is a good time to call. can you please give access to this email address as well parmidak@uwindsor.ca , I can't access any of the files on teams any more !

Parmidakhashayar commented 1 week ago

Hello @hosseinfani,

I have some good news and some bad news !

The good news is that I was able to make a virtual python environment for the LADy codebase using Conda. I finally have all the lady GitHub files/ folders on my virtual env, I have also come up with a new way rather than cloning/ forking that I mention in another comment.

The bad news is that I still get configuration and requirements error and can't debug the no_aspect.py file that I'm working on.

Here is the errors I get , I would appreciate your help on this.

Screenshot 2024-08-15 at 5 42 55 PM Screenshot 2024-08-15 at 5 43 08 PM

when I install the requirements , I get an installation failed error and here are the details

Screenshot 2024-08-15 at 5 52 17 PM

Thanks for your help !

Parmidakhashayar commented 1 week ago

Hello @hosseinfani, I hope you're doing well.

I wanted to discuss my availability for the fall semester. Would our current weekly meetings be convenient, or do you think another time might work better?

Thanks, Parmida

hosseinfani commented 1 week ago

Hi @Parmidakhashayar Please start the discussion in the LADy channel and see if we can have a better timing, since @Lillliant and @Sepideh-Ahmadian also may have classes during that time.