fani-lab / LADy

LADy 💃: A Benchmark Toolkit for Latent Aspect Detection Enriched with Backtranslation Augmentation
Other
3 stars 4 forks source link

Adding HAST as a supervised baseline #37

Open farinamhz opened 1 year ago

farinamhz commented 1 year ago

We are going to add HAST as a supervised baseline to our experiment for data augmentation in aspect detection task. It has been published in IJCAI 2018 as "Aspect Term Extraction with History Attention and Selective Transformation".

farinamhz commented 1 year ago

So far, we have understood that for running the HAST code line, we need a wrapper for changing our format of the dataset to the one which is suitable for HAST as it has not been provided how to change the original XML version of semeval datasets to the suitable one of HAST. (suitable format of HAST is "S####x0=O, x1=O, x2=T, x3=O" in which S is the full review sentence and x0, x1, x2, ... are the words, and also T means aspect or target, and other words will be O.

farinamhz commented 1 year ago

(To be updated)

farinamhz commented 1 year ago

The wrapper has been completed for the original dataset and all versions of datasets have been created based on the train as train and valid as test for different percentages of hidden aspects.

The only problem was that the datasets of HAST had special characters, which is #### for separating the tags and sentences, so I changed the characters of hidden aspect from #### to ***** for this baseline.

I have uploaded the code with this special dataset: semeval-14-rest (train and test with 100% hidden aspect) to test the HAST code for our dataset on computecanada.

As soon as I get the results, I will post the update here and start the testing on all versions of semeval with different percentages of hidden aspects.

farinamhz commented 1 year ago

Hi @hosseinfani, I checked the HAST code and results as well as the architecture described in the paper and found out that for each sentence or actually each review, it works like this:

HAST-how

hosseinfani commented 1 year ago

@farinamhz thank you.

What do you think?

farinamhz commented 1 year ago

Both suggestions seem reasonable, and I suggest doing both and seeing the results of pytrec. I can add a func to the evaluation of HAST and share the results here to decide between these two options. @hosseinfani

farinamhz commented 1 year ago

I have some questions @hosseinfani

Do you have any idea about these?

hosseinfani commented 1 year ago

@farinamhz

farinamhz commented 1 year ago

Hi @hosseinfani,

Update on the evaluation of HAST:

This is an example file of the results in HAST that I have changed the evaluation to one similar to our pipeline. If it is ok, I will move on to the new baselines.

pred.eval.mean.csv

hosseinfani commented 1 year ago

@farinamhz perfect. btw, we only need 1,5,10,100