Classification baseline for aspect term extraction

farinamhz commented 1 year ago

Hi @Lillliant,

To update you first on what we are doing; We are adding new baselines to LADy for checking the results of back-translation on different aspect term models.

As you may know already, we have two kinds of models for aspect extraction;

The first one is aspect category detection models, which are the ones that find the aspect categories, like food, service, anecdotes/miscellaneous, etc.
The second types of model are aspect term detection which tries to find the aspect terms used in the reviews. For example, they may find sushi, price, menu, etc.

Our focus is on the second ones that try to find the aspect terms. However, not all the aspect term models can be helpful in this task as we are trying to find the aspects, whether they are latent or explicit. In fact, aspect term models which try to find the index of the words as the aspect in the review (span-based ones), or the models that tag the words and say whether each word is aspect or not, are not helpful because if the word is removed due to different reasons like the social background and it is latent, these models can not find them anymore and they fall short in case of latent aspect detection.

So, we were using topic models at first that provided us with some top words in the same topic of the review based on the model's vocabulary.

Now we are searching for a model which is initially for the task of aspect term extraction, not the topic modeling, that can somehow classify the reviews and find the aspect term label even if it is not presented explicitly in the review.

Let's say we have a review describing the menu of the restaurant. So, in this case, the model should be able to do sth like classification and say the aspect term is the menu. Actually, we don't want it to say which word of the review is an aspect term and, in fact, give us a word that may be the aspect term.

We would appreciate it if you help us with this task and see if any of the recent models (preferably after 2019) can do this and if their code or library is available, or we can send them an email to ask for the code even if it is not available.

I have made an Excel sheet in teams files in the LADy channel and put some recent works there so that you can check them, and we can complete the file together and find the suitable one.

farinamhz commented 1 year ago

@hosseinfani I have defined this task, and I appreciate your comments.

farinamhz commented 1 year ago

Hey @Lillliant, have you found any classification one out of the baselines you found and we have in the baseline excel for aspect detection?

What we're going to do is the same as what we have done so far;

find a baseline
create a wrapper that can load and use the review.pkl file and split.json to create suitable data for the baseline,
run and experiment

By passing reviews and corresponding aspect labels, we want the classification model to label the new reviews in the test dataset.

First, we go over the baselines we found or any new ones that we can add to use a baseline that has already focused on aspect detection. However, besides that, we want to add a text classification (even if not specifically designed for aspect detection, a simple architecture can also work) to the pipeline to test the augmentation method.

Lillliant commented 1 year ago

Hi @farinamhz,

Sorry for replying this late: I haven't been able to find a suitable list of baselines with public code for classifications yet. However, I will try to expedite the process and post an update on Friday.

farinamhz commented 1 year ago

@Lillliant, Any update on this? If you have found anything which uses its own vocabulary to predict the aspect terms and does not give us the aspect term which is explicitly mentioned in the review by giving the word, index, or span, that would also work.

Lillliant commented 1 year ago

Hello @farinamhz,

Sorry for the late update, I've attached a summary of my findings here:

I've been looking through models that have focused on latent aspect detection, and so far I have only managed to find one such paper with public code. However, there were two methods (including the one with public GitHub code) that, according to other survey papers, have one of the better performance among other similar methods, especially in the restaurant domain (SemEval-15):

NMF detects latent aspect using non-negative matrix factorization on opinion-aspect co-occurence and opinion-aspect intra-relation matrices. This method did not perform as well as the next method and might depend on the training set's vocabulary, but is more recent and has public code.
Hybrid approach to extract adjectives for implicit aspect identification in opinion mining finds latent aspect using the adjectives present in sentences and which aspects they best represent as. Although had the best performance when comparing to other methods and is WordNet dictionary-based, this method is older (2016) and has no public code.

As for text classification models, I've found a few that may be useful (although none are optimized for aspect detection and are more for topic classification):

FastText is a linear classifier that treats sentences as bag of words. It appears to be consistently maintained until now even though the paper was published in 2017, and is well-rounded in the datasets of different domains when compared to other methods.
LEAM classifies text by combining word and label embeddings in the attention framework. It is also maintained until relatively recently (2019) and is also relatively well-rounded.

There are also some other aspect detection methods that mentioned focus on latent aspect detection, although many uses WordNet to some extend or are variants of topic modelling methods. Although I'm not sure if using external dictionary like WordNet can effectively address OOV issues, I can also look deeper and see if there are other methods that can address the issues while having ability to detect latent aspects.

farinamhz commented 1 year ago

Thank you very much @Lillliant!

Regarding aspect detection, we should go with the ones with a public code so that the second one will be removed from our options. The first one is also a little old, and it would be better to find a new one. What about the ones we found and put in the team's baseline sheet? Have you checked them?

Regarding text classification, we can employ FastText, as I also know it has been maintained until now and is famous. Can you add it to our codeline? Just like the topic modeling methods, you can put it as an aspect detection model and use its method for overriding the aspect detection methods; for example, you can train it and use the aspects as the labels and again for the testing, labels that are found by the classification will be the aspects of the reviews. Does that make sense?

Feel free to ask me if there is any question, and also we can have a meeting if you need one.

farinamhz commented 1 year ago

@Lillliant, Any updates on this? If you have not started on this classification, it is better to first start with another baseline and then proceed with FastText. Let me know at which stage you are for this task now, please, then we can decide.

Lillliant commented 1 year ago

Hi @farinamhz,

I have only started working on the task recently, and have so far installed the dependencies and try to start FastText with their example. I apologize for this, and will put this as my priority for the next weeks.

As well, would it be possible to clarify what you mean by starting with another baseline?

farinamhz commented 1 year ago

Hey @Lillliant,

I'd like to discuss another baseline with you at a later time. For now, let's prioritize completing the work you've already begun. I'm eagerly awaiting the integration of FastText into our LADy codebase, particularly within the object related to the aspect detection model and its associated methods for training, testing, and so on.

Please inform me once this task is completed or if you have any questions regarding it.

Lillliant commented 1 year ago

Last week's progress:

found out that pip install fasttext-wheel installs fasttext without issue (update for this: this installs the 2020 version release, so if we may have to dockerize the later versions once issue with local install is figured out)
testing preliminary training and inferring method on toy .pkl file
added preprocessing method (add label tag to reviews)

Some of this week's progress:

partially integrate it to lady baseline using mdl.py inheritance
for now, the testing is done in fast_sample_run.py. Code snippets are selected from main.py to see if the result is expected for ease of full integration

Lillliant commented 1 year ago

I know we mentioned about having pair programming session for the fasttext integration for this week, but is it alright if we can push it to a later time? My laptop screen has been intermittently going out and showing a power rail failure signal since Tuesday night, and I'm hoping to back up my assignments, which is due in-person via laptop demo on Friday. I'm very sorry for the late notice, and I will try to complete the remaining experiments and any bug fix through an external monitor.

hosseinfani commented 1 year ago

@Lillliant sorry to hear that. no rush. take your time.

Lillliant commented 1 year ago

This week's progress:

fixed the bug that causes duplicate "label" to be assigned
added fasttext to main.py
currently trying to debug the running of fasttext in main using SemEval 2016 dataset

Lillliant commented 1 year ago

This week's progress:

added code so that not finding a model file raises the FileNotFoundError for triggering training process

looking through code to debug the next issues: error code:

File "C:\Users\cw_\Documents\GitHub\LADy\src\main.py", line 230, in main
pairs = test(am, np.array(reviews)[splits['test']].tolist(), f, output)
File "C:\Users\cw_\Documents\GitHub\LADy\src\main.py", line 132, in test
pairs = am.infer_batch(reviews_test=test, h_ratio=params.settings['test']['h_ratio'], doctype=params.settings['prep']['doctype'], output=output)
File "C:\Users\cw_\Documents\GitHub\LADy\src\aml\mdl.py", line 68, in infer_batch
pairs.extend(list(zip(r_aspect_ids, self.merge_aspects_words(r_pred_aspects, self.nwords))))
File "C:\Users\cw_\Documents\GitHub\LADy\src\aml\mdl.py", line 79, in merge_aspects_words
subr_pred_aspects_words = [[(w, a_p * w_p) for w, w_p in self.get_aspect_words(a, nwords)] for a, a_p in subr_pred_aspects]
File "C:\Users\cw_\Documents\GitHub\LADy\src\aml\mdl.py", line 79, in <listcomp>
subr_pred_aspects_words = [[(w, a_p * w_p) for w, w_p in self.get_aspect_words(a, nwords)] for a, a_p in subr_pred_aspects]
ValueError: too many values to unpack (expected 2)

added preliminary merge_aspect_words method for fasttext. Because fasttext doesn't have its own method to get the words associated with the probability of a particular label, our own method may have to be implemented. This will be further investigated and tested in this and upcoming week.
any attempt at retrieving the model's labels and words return: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 13: invalid start byte

Lillliant commented 1 year ago

Also, this is a sample of what fasttext's infer currently return without hyperparameter optimization in the main baseline:

(without ova)

downtown dinner 2002 - prixe fix: ##### were ok, ##### gave me poor suggestion..try the ##### ##### ##### best one.
(('__label__service', '__label__food', '__label__indian', '__label__restaurant', '__label__meal', '__label__atmosphere', '__label__vibe', '__label__ambience', '__label__taiwanese', '__label__cuisine', '__label__management', '__label__hot', '__label__ravioli', '__label__owner', '__label__dogs', '__label__sushi', '__label__delivery', '__label__shabu', '__label__decor', '__label__waiters', '__label__shabu-shabu', '__label__bagel', '__label__setting', '__label__gyros', '__label__thai'), array([9.53289747e-01, 3.47211286e-02, 2.55457335e-03, 2.01263861e-03,
       1.74835557e-03, 9.88340122e-04, 6.84599916e-04, 4.75223758e-04,
       4.60051262e-04, 3.82050261e-04, 3.68559267e-04, 2.64495029e-04,
       2.28658755e-04, 1.79115188e-04, 1.53390574e-04, 1.29657288e-04,
       1.27313295e-04, 1.20565273e-04, 9.59045428e-05, 9.23039697e-05,
       8.81278102e-05, 7.97121902e-05, 7.73224310e-05, 7.70277402e-05,
       7.46061705e-05]))
i am not a vegetarian but, almost all the ##### were great.
(('__label__pizza', '__label__place', '__label__food', '__label__atmosphere', '__label__ambience', '__label__waitress', '__label__staff', '__label__menu', '__label__service', '__label__shabu', '__label__of', '__label__sake', '__label__waiter', '__label__survice', '__label__yuka', '__label__crust', '__label__restaurant', '__label__33', '__label__slice', '__label__brasserie', '__label__indian', '__label__lassi', '__label__dim', '__label__dishes', '__label__drinks'), array([0.42055467, 0.27365553, 0.10663503, 0.08352713, 0.02370679,
       0.01631287, 0.01510514, 0.0095979 , 0.00401992, 0.00302907,
       0.00216385, 0.00187277, 0.00154353, 0.00128763, 0.0012316 ,
       0.00122716, 0.00108614, 0.00095609, 0.00094403, 0.00093263,
       0.00084575, 0.00080234, 0.00079421, 0.00079276, 0.00076186]))

(with ova)

what a great #####
(('__label__food', '__label__hot', '__label__dogs', '__label__staff', '__label__indian', '__label__survice', '__label__sake', '__label__atmosphere', '__label__shabu', '__label__place', '__label__pizza', '__label__of', '__label__service', '__label__cucumber', '__label__sushimi', '__label__rose', '__label__teriyaki', '__label__bass', '__label__decor', '__label__roll', '__label__lobster', '__label__sassy', '__label__fish', '__label__waitress', '__label__corona'), array([4.53271836e-01, 1.00888625e-01, 1.00888625e-01, 6.75566867e-02,
       5.03406264e-02, 4.88677844e-02, 4.88677844e-02, 4.74358723e-02,
       4.74358723e-02, 4.08557132e-02, 3.41104269e-02, 1.17951049e-03,
       9.49943671e-04, 8.65900831e-04, 7.89366022e-04, 6.76702184e-04,
       6.17075537e-04, 5.62778732e-04, 4.97857109e-04, 4.68312966e-04,
       4.27315681e-04, 3.45350214e-04, 1.00000034e-05, 1.00000034e-05,
       1.00000034e-05]))
i was speechless by the horrible #####
(('__label__staff', '__label__drinks', '__label__ambience', '__label__bread', '__label__food', '__label__of', '__label__olives', '__label__waiter', '__label__sushi', '__label__atmosphere', '__label__and', '__label__roll', '__label__restaurant', '__label__pizza', '__label__fish', '__label__meal', '__label__waitress', '__label__thai', '__label__dessert', '__label__the', '__label__appetizer', '__label__corona', '__label__tartar', '__label__lounge', '__label__downstairs'), array([9.99402940e-01, 2.81092734e-03, 1.51118217e-03, 1.33502227e-03,
       6.97851181e-04, 5.62778732e-04, 4.27315681e-04, 1.00000034e-05,
       1.00000034e-05, 1.00000034e-05, 1.00000034e-05, 1.00000034e-05,
       1.00000034e-05, 1.00000034e-05, 1.00000034e-05, 1.00000034e-05,
       1.00000034e-05, 1.00000034e-05, 1.00000034e-05, 1.00000034e-05,
       1.00000034e-05, 1.00000034e-05, 1.00000034e-05, 1.00000034e-05,
       1.00000034e-05]))

Lillliant commented 1 year ago

This week's update:

resolved the issue with unicode error: it seems that when forcing the training file to be in utf-8, the resultant model can be in utf-8 as well.
debugged the merge_aspect_word methods (overriden the methods in mdl in association with it and added helper methods).

The code can be found in the Pull Request area. Later, I will also upload the result files I got into Teams to see if it looks reasonable.

Lillliant commented 12 months ago

Update:

The sentiment analysis methods for FastText has been added.
- General idea of the method:
  1. Embed "labelSENTIMENT" into the review text and train the model based on it
  2. Make prediction using test review text that outputs ("labelSENTIMENT", probability) that has the highest probability for the sentiment value
  3. Format the output in such a way that mdl.py's infer_batch_sentiment can be used in the rest of the main.py process
Based on observation of BERT's sentiment analysis, it seems that one sentiment is predicted for each sentence in a review (e.g., "The server is friendly" -> POS). FastText also follows this pattern: it outputs (SENTIMENT, probability) where SENTIMENT is the most likely prediction. However, please let me know if I should extend this to find the probability of other sentiment values too.

farinamhz commented 12 months ago

Thank you very much @Lillliant! I appreciate your nice updates. We'll double-check the output of the inference tomorrow in our meeting, and then I'll accept the PR you sent.

fani-lab / LADy

Classification baseline for aspect term extraction #42