Closed hosseinfani closed 1 year ago
Any updates @Lillliant?
Hi @farinamhz,
Sorry for the late reply. I've finished the preliminary code and plots for the distribution of aspects and words (tokens), which I've made a PR (#39) so the code's accuracy can be reviewed.
Additionally, the plots can be seen in this folder: semeval data.
Currently, only the original datasets for semeval are used for generating the stats, because it doesn't look like the augmented reviews.pkl have information on the new aspects based on the code and the generated stats. Please let me know if I should use the labelled backtranslation datasets from the data folder instead.
An example for semeval-16/15/14 looks like this (naspects_nreviews)
The gists of the stats in the folders are as follows:
Hi @Lillliant @farinamhz
thank you very much. few questions:
*.pkl
file or *.xml
? It would be great if it is *.pkl
review.{languages used for augmentation}.pkl
. Also, the augmented review is stored in the augs
dictionary. I think this class diagram helps you:https://github.com/fani-lab/LADy/blob/main/src/cmn/LADy.png
let me know if you need more help.
Hi @hosseinfani @farinamhz,
The input for the methods is indeed *.pkl
. I will update the code so it looks a little clearer what the input is.
Also, thank you for telling me about the augs
dictionary storing the augmented reviews. It's taking my computer a while to generate the augmented datasets, but I'll update the branch with distribution results as soon as they are generated.
@Lillliant I have all the files in my pc. I'll upload them in lady channel now. you don't have to generate the translations.
@Lillliant @farinamhz I think we can safely close this issue. let me know otherwise.
@farinamhz @Lillliant As part of stats on datasets, we need to show the distribution of aspects in each dataset, which is probably long-tail or imbalanced.
and distribution of words in semevals
For an example codebase, you can look at this function that produce stats on dataset of teams:
https://github.com/fani-lab/OpeNTF/blob/45aa32b1e32edc906d926c7f841a4ec089f34d18/src/cmn/team.py#L210