fani-lab / LADy

LADy 💃: A Benchmark Toolkit for Latent Aspect Detection Enriched with Backtranslation Augmentation
Other
3 stars 4 forks source link

Web Application for LADy #32

Open farinamhz opened 1 year ago

farinamhz commented 1 year ago

Hi @impedaka,

Welcome to LADy :)

This is an issue page to log your progress in developing the web app.

Please let us know if you have any concerns or questions.

@hosseinfani

impedaka commented 1 year ago

@farinamhz @hosseinfani

image

I'm getting this error by doing pip install -r requirements.txt I think requirements.txt should be double checked.

Not sure how to get colab working image

impedaka commented 1 year ago

While I was struggling to run the program, I made a sketch of the layout and is working on the website template image

image

hosseinfani commented 1 year ago

@impedaka Awesome. Thank you. Some questions:

impedaka commented 1 year ago

@hosseinfani

Random in the left side generates random review?

Yes. I can remove it if it's not needed.

Will the right panel be showing pairs of (aspect, probability) in a sorted descending? Because we need to show that the possible aspects will be these items, each with a probability score.

We can do unordered/ordered list or tags. Unless we want the data to be displayed another way like a table. image

hosseinfani commented 1 year ago

@impedaka

image

impedaka commented 1 year ago

@hosseinfani

We can have it but how do you generate a random review? You can have a small file containing some reviews and select feom them? or other ways you're thinking of?

We could alternatively use a review API https://rapidapi.com/search/review. There would be more sample reviews to work with, but there's usually a limit to how many we can request. (Depending on which one we use, it's around 15-500 requests per month). I can set up either option

Oredered list (descending based on score) please. Can you make it like this from https://github.com/bmabey/pyLDAvis

How should I make it similar to the image? Is ordered list descending order based on score good enough?

hosseinfani commented 1 year ago
impedaka commented 1 year ago

@hosseinfani

I think it's a horizontal bar chart using matplotlib. Look at the github link for details.

I can display the right panel like this with Chart.js image

hosseinfani commented 1 year ago

@impedaka that's nice. Just a reminder that the bars should be sorted.

impedaka commented 1 year ago

@hosseinfani I got the website layout done. I think it's time to add the models. Is there a specific folder I should look at? Knowing the format of it's input and output would be nice, otherwise I'll look at the source code. https://la-dy.vercel.app/

For the random review, I just took a text file from our github repo (that was used to train/test the models)

hosseinfani commented 1 year ago

@impedaka perfect. I'm also working on this project currently.

@farinamhz do you have any saved model, even random, that can accept an input text and predict an aspect term? You can give them to Alice for test purposes. Thank you.

impedaka commented 1 year ago

Hey @farinamhz, any saved models I can use? A random one for testing purposes is fine. Thanks!

hosseinfani commented 1 year ago

@impedaka in this path, you can fine the required file to load lda model:

https://github.com/fani-lab/LADy/tree/main/output/semeval/SemEval-14/Semeval-14-Restaurants_Train.xml/5/lda

f0.model f0.model.dict f0.model.expElogbeta.npy f0.model.id2word f0.model.perf.cas f0.model.perf.perplexity

you need to:

from aml.lda import Lda

# loading required files for the model object
path = './output/semeval/SemEval-14/Semeval-14-Restaurants_Train.xml/5/lda'
am = Lda(args.naspects)
am.load(f'{path}/f0.')

# creating a single object for the input review after removing stop words and split()
r = Review(id=0, sentences=[['this','is','a','review']], time=None, author=None, aos=None, lempos=None, parent=None, lang='eng_Latn')

# predicting the words as aspects in descending order 
r_pred_aspects = am.infer('snt', r)
am.get_aspects_words(r_pred_aspects, 20)
hosseinfani commented 1 year ago

@impedaka pull the latest code pls

hosseinfani commented 1 year ago

@impedaka any update?

impedaka commented 1 year ago

@hosseinfani

image

I hope I'm doing this right. I think I didn't give it a proper review input

I couldn't find https://github.com/fani-lab/LADy/tree/main/output/semeval/SemEval-14/Semeval-14-Restaurants_Train.xml/5/lda, so instead I used https://github.com/fani-lab/LADy/tree/main/output/semeval%2B/toy.2016SB5/ABSA16_Restaurants_Train_SB1_v2.xml/5.arb_Arab

I made slight modifications to the code you gave me

from aml.lda import Lda
import argparse
from cmn.review import Review

# loading required files for the model object
path = './model_path'

parser = argparse.ArgumentParser(description='Latent Aspect Detection')
parser.add_argument('-naspects', dest='naspects', type=int, default=5, help='user-defined number of aspects, e.g., -naspect 25')
args = parser.parse_args()

am = Lda(args.naspects)
am.load(f'{path}/f0.',"")

# creating a single object for the input review after removing stop words and split()
r = Review(id=0, sentences=[['this','is','a','review']], time=None, author=None, aos=None, lempos=None, parent=None, lang='eng_Latn')

# predicting the words as aspects in descending order 
r_pred_aspects = am.infer( r,"snt")

top_words = am.get_aspects_words(10)
print(top_words)
hosseinfani commented 1 year ago

@impedaka also, the model is trained on a toy dataset. so, the result is normal :)

Please finish the initial website. So I deploy it to the school's server and put the models that are trained on real data (they are big). Can you design the website with three dropdown box and a text box:

textbox: user puts a number to select the number of aspects => this relates to the first number before the folder names like in 5.arb_Arab

dropdown: user selects the language for augmentation from the list ['none', pes_Arab', 'zho_Hans', 'deu_Latn', 'arb_Arab', 'fra_Latn', 'spa_Latn'] => this relates to the second part of folder name like in 5.arb_Arab

dropdown: user selects the model from the list ['random', 'lda', 'ctm', 'btm'] => this relates to the model name like in 5.arb_Arab/btm

for example, if the user puts 5, selects fra_Latn and btm, we need to load ./models/5.fra_Latn/btm/f0.model.*

for example, if the user puts 5, selects none and btm, we need to load ./models/5/btm/f0.model.*

it is difficult for ctm. if the user puts 5, selects fra_Latn and ctm, we need to load ./models/5.fra_Latn/btm/f0.model/{whatever here is not important}/epoch_{some number}.pth

You can look at https://github.com/fani-lab/LADy/blob/57d6e7d8a2beeb31c31ae7273e21169ee8c80d0a/src/aml/ctm.py#L22 for ctm

hosseinfani commented 1 year ago

As you see, the folder structure is like {#aspect}.{language if any}.{model}.f0.model

You have to show to the users the unique #apsects, unique names of languages, and unique names of models

Forget about the folder that has more than one languages

The number of bar charts should match the number of aspects

image

impedaka commented 1 year ago

@hosseinfani image

image

top_words = am.get_aspects_words( 5)
hosseinfani commented 1 year ago

@impedaka you're right. I'll fix it for you.

hosseinfani commented 1 year ago

@impedaka

here is an example on how to call models for prediction on new sentence:

https://github.com/fani-lab/LADy/blob/main/src/main_web.py

Right now, in the github, you can test on ./output/semeval%2B/toy.2016SB5/ABSA16_Restaurants_Train_SB1_v2.xml/ folder and the models inside it.

Please pull the latest code!

impedaka commented 1 year ago

@hosseinfani Everything is working except CTM image image

if model == "ctm": am = Ctm(naspects, nwords, contextual_size = 768, nsamples =10)
path = f"./models/{naspects}{'.'+lang if lang else ''}/{am.name()}"
am.load(f'{path}/f0.')

image image

I was thinking we host the backend API on the school's server like NeuCG and use Vercel for the frontend (unless we can host both sides, but the website layout is a bit different from NeuCG)

hosseinfani commented 1 year ago

@impedaka Thank you very much. Looks awesome. Just update the codeline with that missing file for ctm. please test. I'm not familiar with Vercel. So, please send pr so I can try the school's server. then we can decide.

impedaka commented 1 year ago

@hosseinfani CTM works now :D error fixed when I installed the CUDA toolkit and added this to requirements.txt image

torch==1.13.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117

image

impedaka commented 1 year ago

My website api code is a bit weird. (src/web/backend/app.py) Hopefully it won't affect deployment

sys.path.append(os.path.abspath(os.path.join(
    os.path.dirname(__file__), '..', '..',)))

from cmn.review import Review
from aml.lda import Lda
from aml.btm import Btm
from aml.ctm import Ctm
from aml.rnd import Rnd

It can't find aml folder unless I go from src/web/backend to src/

hosseinfani commented 1 year ago

@impedaka thank you very much. no problem. please send the pr. I try to deploy it into the school's server. also, I don't think there is gpu available in the host. so, no need to worry about the cuda version.

hosseinfani commented 1 year ago

@impedaka one more request. Can you add the translated and backtranslated version of the input review?

https://github.com/fani-lab/LADy/blob/5f0353c0c7d5880a7872f981f766d8319d0ab78e/src/cmn/review.py#L53

You can call it like this:

review = 'this is a sample'
r = Review(id=0, sentences=[review.split()], time=None, author=None, aos=[[([0],[],0)]], lempos=None, parent=None, lang='eng_Latn', category=None)

settings = {'nllb': 'facebook/nllb-200-distilled-600M', 'max_l': 1024, 'device': 'cpu'}
res = r.translate('pes_Arab', settings)
tranlated_review = res[0].get_txt()
backtranslated_review = res[1].get_txt()
semantic_similarity = res[2]
impedaka commented 1 year ago

@hosseinfani While trying to run r.translate, I ran into an error. image We have numpy 1.24.3, and numpy.int is removed in numpy 1.24

Also the combinedTM library in ctm.py only works with cudas? I'm not sure how to make it only use cpu

if there is a language, show the translated and backtranslated versions. For this, the Review class has a function translate:

For language, it has to be backtranslated or translated? if not backtranslate, its translate? I can make it a select between none, backtranslate and translate instead. The checkmark is also disabled if the user chooses no language image

hosseinfani commented 1 year ago

@impedaka sorry for late reply.

impedaka commented 1 year ago

@hosseinfani I installed numpy 1.20 with python 3.8, but when running src/web/backend/app.py, I get this error

Traceback (most recent call last):
  File "app.py", line 17, in <module>
    from aml.lda import Lda
  File "C:\Users\Qin\LADy\src\aml\lda.py", line 1, in <module>
    import gensim
  File "C:\Users\Qin\LADy\venv\lib\site-packages\gensim\__init__.py", line 11, in <module>
    from gensim import parsing, corpora, matutils, interfaces, models, similarities, utils  # noqa:F401       
  File "C:\Users\Qin\LADy\venv\lib\site-packages\gensim\corpora\__init__.py", line 6, in <module>
    from .indexedcorpus import IndexedCorpus  # noqa:F401 must appear before the other classes
  File "C:\Users\Qin\LADy\venv\lib\site-packages\gensim\corpora\indexedcorpus.py", line 14, in <module>       
    from gensim import interfaces, utils
  File "C:\Users\Qin\LADy\venv\lib\site-packages\gensim\interfaces.py", line 19, in <module>
    from gensim import utils, matutils
  File "C:\Users\Qin\LADy\venv\lib\site-packages\gensim\matutils.py", line 1030, in <module>
    from gensim._matutils import logsumexp, mean_absolute_difference, dirichlet_expectation
  File "gensim\_matutils.pyx", line 1, in init gensim._matutils
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

The error is fixed when we have a numpy version greater than 1.20 such as 1.22. However, we need a lower version of numpy to run the translated and backtranslation function

While trying to run r.translate, I ran into an error. image

hosseinfani commented 1 year ago

@impedaka so, for nllb (translation and backtranslation), we need 1.20. but for gensim we need 1.22. right? can you find a version of gensim that uses 1.20 for numpy. then we won't have this conflict.

impedaka commented 1 year ago

@hosseinfani

Installing a lower version of gensim fixed the issue, but I think I found another issue regarding gensim. It happens when we load the model. I installed gensim 4.2.0 and the latest version is 4.3.1 image image

The translate and backtranslate function worked though image

     print('translated',tranlated_review )
     print("back",backtranslated_review )
     print("sem", semantic_similarity )

We might need to change the code or I could be installing the wrong gensim version

impedaka commented 1 year ago

@hosseinfani If there's anything I can do to help, let me know! I can't seem to update my fork of the project, so I think I'll have to re-fork the main project

Also, is my name going to be included in the authors section in the READ.ME file? https://github.com/fani-lab/LADy/tree/main#authors

impedaka commented 1 year ago

@farinamhz any updates?

farinamhz commented 1 year ago

Hey @impedaka , I added your name to the authors, Now, as you may know, we are going to deploy the web application, so prior to that, you can get the latest version of the LADy and let me know if there is any problem with the installation and running now. If there is any, I will update it, and once it is ok for you, we will go for the next steps of our web application.

impedaka commented 1 year ago

Alright, thank you!

On Thu, Jul 6, 2023 at 8:35 PM Farinam Hemmatizadeh < @.***> wrote:

Hey @impedaka https://github.com/impedaka , I added your name to the authors, Now, as you may know, we are going to deploy the web application, so prior to that, you can get the latest version of the LADy and let me know if there is any problem with the installation and running now. If there is any, I will update it, and once it is ok for you, we will go for the next steps of our web application.

— Reply to this email directly, view it on GitHub https://github.com/fani-lab/LADy/issues/32#issuecomment-1624473957, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASAX7R2K253LQ3W37B4QFDLXO5KTZANCNFSM6AAAAAAWYJ5WLY . You are receiving this because you were mentioned.Message ID: @.***>

farinamhz commented 1 year ago

Also, please check the new readme, Alice, and let me know if you think any change is needed and can improve it. @impedaka

impedaka commented 1 year ago

@farinamhz sorry for the late reply. i ran main.py from the latest code base with no problems. I'd like to hear the next steps :)

README.md looks good. some of the links are invalid like in https://github.com/impedaka/LADy-1#srccmn

Sample pickle files for a toy dataset: ./output/toy.2016SB5/ABSA16_Restaurants_Train_SB1_v2.xml,

and https://github.com/impedaka/LADy-1#results

dataset | review files (english, chinese, farsi, arabic, french, german, spanish, and all) and results' directory -- | -- semeval-14-laptop | ./output/Semeval-14/Laptop/ 22.0 MB semeval-14-restaurant | ./output/Semeval-14/Restaurants/ 22.2 MB semeval-15-restaurant | ./output/2015SB12/ 53.1 GB semeval-16-restaurant | ./output/2016SB5/ 103 MB toy | ./output/toy.2016SB5/ 64.6 MB https://uwin365.sharepoint.com/sites/cshfrg-ReviewAnalysis/Shared%20Documents/Forms/AllItems.aspx?ga=1&id=%2Fsites%2Fcshfrg%2DReviewAnalysis%2FShared%20Documents%2FLADy%2FLADy0%2E1%2E0%2E0%2Foutput%2Ftoy%2E2016SB5&viewid=4cd69493%2D951c%2D47b5%2Db34a%2Dc1cdbf3a0412 Uwin account required I think
farinamhz commented 1 year ago

Hey @impedaka, No worries. Thank you very much for the time you put into this task. I will check and change the mentioned links and let you know the next step tomorrow.

farinamhz commented 1 year ago

Hi @impedaka As we have a problem with the cloud right now. I will take of the issues with the links later.

Meanwhile, can we run the web app on the local host now, or is there still any problem?

impedaka commented 1 year ago

I can run the web app on local host, Are we using the same layout as the previous website I made? The website code isn't included in the main project, should I add it or make a new one?

On Thu, Jul 13, 2023 at 9:25 PM Farinam Hemmatizadeh < @.***> wrote:

Hi @impedaka https://github.com/impedaka As we have a problem with the cloud right now. I will take of the issues with the links later.

Meanwhile, can we run the web app on the local host now, or is there still any problem?

— Reply to this email directly, view it on GitHub https://github.com/fani-lab/LADy/issues/32#issuecomment-1635123943, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASAX7R7O56KX2MUSLTW5NO3XQCNXZANCNFSM6AAAAAAWYJ5WLY . You are receiving this because you were mentioned.Message ID: @.***>

farinamhz commented 1 year ago

I asked so I could run on local host and take a look at the layout, so we can modify it if needed. It would be great if you add it to the existing codeline. @impedaka

impedaka commented 1 year ago

I can add it, not all of the functionality is working however. You'll need to run both the backend and frontend at the same time.

In LADy/src/web/backend install Flask and flask_cors run python app.py to run the backend server

install nodejs for the frontend if you havent already In LADy/src/web/frontend npm install npm run dev it'll run on localhost:3000

Sorry if this is complicated. I'm using the models from before, so the backend isn't up to date. I'll make a PR :)

On Thu, Jul 13, 2023 at 9:35 PM Farinam Hemmatizadeh < @.***> wrote:

I asked so I could run on local host and take a look at the layout, so we can modify it if needed. It would be great if you add it to the existing codeline. @impedaka https://github.com/impedaka

— Reply to this email directly, view it on GitHub https://github.com/fani-lab/LADy/issues/32#issuecomment-1635130872, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASAX7R35ETXR7MMLURLNYKLXQCO6BANCNFSM6AAAAAAWYJ5WLY . You are receiving this because you were mentioned.Message ID: @.***>

farinamhz commented 1 year ago

Hey @impedaka, thank you again for the updates, that is great, and I asked @hosseinfani if we can accept the PR now.

Meanwhile, I will share a google doc for the web app and list whatever we need as modifications or new requirements there so that we can explain and update with @hosseinfani.

impedaka commented 1 year ago

Hey @farinamhz image image

I made some changes to the design. Let me know what you think.

Personally, I think the image would look nicer if it were on the right side and not transparent. I can make a preview if you're interested. I'm still making minor adjustments to the website to make it more user friendly. Update me on the google docs for whatever needs to be done :)

farinamhz commented 1 year ago

Hey @impedaka, Great, thank you very much for the updates.

impedaka commented 1 year ago

@farinamhz

regarding the emoji for the webpage, is it possible that we make it transparent so if we have a different color, like purple in your case or any dark one, we do not have the white area surrounding it?

I will make it transparent :)

regarding google docs, I just shared it with you.

Thank you!

impedaka commented 1 year ago

@farinamhz

Good news! I got the backtranslation working without changing the numpy version. Before it didn't work because it relied on a older version of numpy. Downgrading numpy caused other problems with libraries and functions.

I just updated networkx to 3.1 (simalign used previously 2.4) image hopefully it doesn't negatively impact anything. So far no errors for me

image

    settings = {'nllb': 'facebook/nllb-200-distilled-600M', 'max_l': 1024, 'device': 'cpu'}
    res = r.translate('pes_Arab', settings)
    tranlated_review = res[0].get_txt()
    backtranslated_review = res[1].get_txt()
    semantic_similarity = res[2]
    print('translated',tranlated_review )
    print("back",backtranslated_review )
    print("sem", semantic_similarity )

before I fixed it, this was the error:

Traceback (most recent call last):
  File "C:\Users\Qin\LADy-1\venv\lib\site-packages\flask\app.py", line 2213, in __call__
    return self.wsgi_app(environ, start_response)
  File "C:\Users\Qin\LADy-1\venv\lib\site-packages\flask\app.py", line 2193, in wsgi_app
    response = self.handle_exception(e)
  File "C:\Users\Qin\LADy-1\venv\lib\site-packages\flask_cors\extension.py", line 176, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "C:\Users\Qin\LADy-1\venv\lib\site-packages\flask\app.py", line 2190, in wsgi_app
    response = self.full_dispatch_request()
  File "C:\Users\Qin\LADy-1\venv\lib\site-packages\flask\app.py", line 1486, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "C:\Users\Qin\LADy-1\venv\lib\site-packages\flask_cors\extension.py", line 176, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "C:\Users\Qin\LADy-1\venv\lib\site-packages\flask\app.py", line 1484, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\Qin\LADy-1\venv\lib\site-packages\flask\app.py", line 1469, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "C:\Users\Qin\LADy-1\src\web\backend\app.py", line 48, in api
    res = r.translate('pes_Arab', settings)
  File "C:\Users\Qin\LADy-1\src\cmn\review.py", line 67, in translate
    translated_obj.aos, _ = self.semalign(translated_obj)  
  File "C:\Users\Qin\LADy-1\src\cmn\review.py", line 85, in semalign
    from simalign import SentenceAligner
  File "C:\Users\Qin\LADy-1\venv\lib\site-packages\simalign\__init__.py", line 1, in <module>
    from .simalign import EmbeddingLoader, SentenceAligner 
  File "C:\Users\Qin\LADy-1\venv\lib\site-packages\simalign\simalign.py", line 13, in <module>
    import networkx as nx
  File "C:\Users\Qin\LADy-1\venv\lib\site-packages\networkx\__init__.py", line 115, in <module>
    import networkx.readwrite
  File "C:\Users\Qin\LADy-1\venv\lib\site-packages\networkx\readwrite\__init__.py", line 15, in <module>
    from networkx.readwrite.graphml import *
  File "C:\Users\Qin\LADy-1\venv\lib\site-packages\networkx\readwrite\graphml.py", line 314, in <module>
    class GraphML(object):
  File "C:\Users\Qin\LADy-1\venv\lib\site-packages\networkx\readwrite\graphml.py", line 346, in GraphML
    (np.int, "int"), (np.int8, "int"),
  File "C:\Users\Qin\LADy-1\venv\lib\site-packages\numpy\__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
void this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.

The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations  
impedaka commented 1 year ago

image image

I will update you on the design for the new backtranslation and translation function. Let me know your ideas! Also I added an "Augmentation" checkmark, but I dont know how it'll work

Adding different metrics results to the results displayed (we have success@k, but we need map@k, recall@k, ndcg@k, and precision@k, which are all in the output available), it can be a choice after the results are displayed to be changed.

also I'm not sure how to use different metric results. Either I spend more time figuring it out, or you can show me :)

farinamhz commented 1 year ago

Hey @impedaka, Great! Thank you for letting me know of the issue and update on the fix. __ Nice! I think if we have sth like:

Translated (Farsi): "..." Backtranslated (English): "..."

With a suitable space of more than one line between these two parts, as characters may differ in size.

For the augmentation, I am going to talk with @hosseinfani first, as we should provide different models in this regard. Otherwise, we need to mention all the results will be based on the augmentation, and in that case, we need to remove the checkbox. Anyway, thank you for adding it. I'll update you on this.

I will provide an example of the results in a few minutes for you to tell you about the metrics.