Open farinamhz opened 1 year ago
@farinamhz @hosseinfani
I'm getting this error by doing pip install -r requirements.txt
I think requirements.txt should be double checked.
Not sure how to get colab working
While I was struggling to run the program, I made a sketch of the layout and is working on the website template
@impedaka Awesome. Thank you. Some questions:
@hosseinfani
Random in the left side generates random review?
Yes. I can remove it if it's not needed.
Will the right panel be showing pairs of (aspect, probability) in a sorted descending? Because we need to show that the possible aspects will be these items, each with a probability score.
We can do unordered/ordered list or tags. Unless we want the data to be displayed another way like a table.
@impedaka
We can have it but how do you generate a random review? You can have a small file containing some reviews and select feom them? or other ways you're thinking of?
Oredered list (descending based on score) please. Can you make it like this from https://github.com/bmabey/pyLDAvis
@hosseinfani
We can have it but how do you generate a random review? You can have a small file containing some reviews and select feom them? or other ways you're thinking of?
We could alternatively use a review API https://rapidapi.com/search/review. There would be more sample reviews to work with, but there's usually a limit to how many we can request. (Depending on which one we use, it's around 15-500 requests per month). I can set up either option
Oredered list (descending based on score) please. Can you make it like this from https://github.com/bmabey/pyLDAvis
How should I make it similar to the image? Is ordered list descending order based on score good enough?
yes, please set for both options
I think it's a horizontal bar chart using matplotlib. Look at the github link for details.
@hosseinfani
I think it's a horizontal bar chart using matplotlib. Look at the github link for details.
I can display the right panel like this with Chart.js
@impedaka that's nice. Just a reminder that the bars should be sorted.
@hosseinfani I got the website layout done. I think it's time to add the models. Is there a specific folder I should look at? Knowing the format of it's input and output would be nice, otherwise I'll look at the source code. https://la-dy.vercel.app/
For the random review, I just took a text file from our github repo (that was used to train/test the models)
@impedaka perfect. I'm also working on this project currently.
@farinamhz do you have any saved model, even random, that can accept an input text and predict an aspect term? You can give them to Alice for test purposes. Thank you.
Hey @farinamhz, any saved models I can use? A random one for testing purposes is fine. Thanks!
@impedaka
in this path, you can fine the required file to load lda
model:
f0.model f0.model.dict f0.model.expElogbeta.npy f0.model.id2word f0.model.perf.cas f0.model.perf.perplexity
you need to:
from aml.lda import Lda
# loading required files for the model object
path = './output/semeval/SemEval-14/Semeval-14-Restaurants_Train.xml/5/lda'
am = Lda(args.naspects)
am.load(f'{path}/f0.')
# creating a single object for the input review after removing stop words and split()
r = Review(id=0, sentences=[['this','is','a','review']], time=None, author=None, aos=None, lempos=None, parent=None, lang='eng_Latn')
# predicting the words as aspects in descending order
r_pred_aspects = am.infer('snt', r)
am.get_aspects_words(r_pred_aspects, 20)
@impedaka pull the latest code pls
@impedaka any update?
@hosseinfani
I hope I'm doing this right. I think I didn't give it a proper review input
I couldn't find https://github.com/fani-lab/LADy/tree/main/output/semeval/SemEval-14/Semeval-14-Restaurants_Train.xml/5/lda, so instead I used https://github.com/fani-lab/LADy/tree/main/output/semeval%2B/toy.2016SB5/ABSA16_Restaurants_Train_SB1_v2.xml/5.arb_Arab
I made slight modifications to the code you gave me
from aml.lda import Lda
import argparse
from cmn.review import Review
# loading required files for the model object
path = './model_path'
parser = argparse.ArgumentParser(description='Latent Aspect Detection')
parser.add_argument('-naspects', dest='naspects', type=int, default=5, help='user-defined number of aspects, e.g., -naspect 25')
args = parser.parse_args()
am = Lda(args.naspects)
am.load(f'{path}/f0.',"")
# creating a single object for the input review after removing stop words and split()
r = Review(id=0, sentences=[['this','is','a','review']], time=None, author=None, aos=None, lempos=None, parent=None, lang='eng_Latn')
# predicting the words as aspects in descending order
r_pred_aspects = am.infer( r,"snt")
top_words = am.get_aspects_words(10)
print(top_words)
@impedaka also, the model is trained on a toy dataset. so, the result is normal :)
Please finish the initial website. So I deploy it to the school's server and put the models that are trained on real data (they are big). Can you design the website with three dropdown box and a text box:
textbox: user puts a number to select the number of aspects => this relates to the first number before the folder names like in 5.arb_Arab
dropdown: user selects the language for augmentation from the list ['none', pes_Arab', 'zho_Hans', 'deu_Latn', 'arb_Arab', 'fra_Latn', 'spa_Latn'] => this relates to the second part of folder name like in 5.arb_Arab
dropdown: user selects the model from the list ['random', 'lda', 'ctm', 'btm'] => this relates to the model name like in 5.arb_Arab/btm
for example, if the user puts 5, selects fra_Latn
and btm
, we need to load ./models/5.fra_Latn/btm/f0.model.*
for example, if the user puts 5, selects none
and btm
, we need to load ./models/5/btm/f0.model.*
it is difficult for ctm
. if the user puts 5, selects fra_Latn
and ctm
, we need to load ./models/5.fra_Latn/btm/f0.model/{whatever here is not important}/epoch_{some number}.pth
You can look at https://github.com/fani-lab/LADy/blob/57d6e7d8a2beeb31c31ae7273e21169ee8c80d0a/src/aml/ctm.py#L22 for ctm
As you see, the folder structure is like {#aspect}.{language if any}.{model}.f0.model
You have to show to the users the unique #apsects, unique names of languages, and unique names of models
Forget about the folder that has more than one languages
The number of bar charts should match the number of aspects
@hosseinfani
top_words = am.get_aspects_words( 5)
@impedaka you're right. I'll fix it for you.
@impedaka
here is an example on how to call models for prediction on new sentence:
https://github.com/fani-lab/LADy/blob/main/src/main_web.py
Right now, in the github, you can test on ./output/semeval%2B/toy.2016SB5/ABSA16_Restaurants_Train_SB1_v2.xml/
folder and the models inside it.
Please pull the latest code!
@hosseinfani Everything is working except CTM
if model == "ctm": am = Ctm(naspects, nwords, contextual_size = 768, nsamples =10)
path = f"./models/{naspects}{'.'+lang if lang else ''}/{am.name()}"
am.load(f'{path}/f0.')
I was thinking we host the backend API on the school's server like NeuCG and use Vercel for the frontend (unless we can host both sides, but the website layout is a bit different from NeuCG)
@impedaka Thank you very much. Looks awesome. Just update the codeline with that missing file for ctm. please test. I'm not familiar with Vercel. So, please send pr so I can try the school's server. then we can decide.
@hosseinfani CTM works now :D error fixed when I installed the CUDA toolkit and added this to requirements.txt
torch==1.13.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
My website api code is a bit weird. (src/web/backend/app.py) Hopefully it won't affect deployment
sys.path.append(os.path.abspath(os.path.join(
os.path.dirname(__file__), '..', '..',)))
from cmn.review import Review
from aml.lda import Lda
from aml.btm import Btm
from aml.ctm import Ctm
from aml.rnd import Rnd
It can't find aml folder unless I go from src/web/backend to src/
@impedaka thank you very much. no problem. please send the pr. I try to deploy it into the school's server. also, I don't think there is gpu available in the host. so, no need to worry about the cuda version.
@impedaka one more request. Can you add the translated and backtranslated version of the input review?
https://github.com/fani-lab/LADy/blob/5f0353c0c7d5880a7872f981f766d8319d0ab78e/src/cmn/review.py#L53
You can call it like this:
review = 'this is a sample'
r = Review(id=0, sentences=[review.split()], time=None, author=None, aos=[[([0],[],0)]], lempos=None, parent=None, lang='eng_Latn', category=None)
settings = {'nllb': 'facebook/nllb-200-distilled-600M', 'max_l': 1024, 'device': 'cpu'}
res = r.translate('pes_Arab', settings)
tranlated_review = res[0].get_txt()
backtranslated_review = res[1].get_txt()
semantic_similarity = res[2]
@hosseinfani While trying to run r.translate, I ran into an error. We have numpy 1.24.3, and numpy.int is removed in numpy 1.24
Also the combinedTM library in ctm.py only works with cudas? I'm not sure how to make it only use cpu
if there is a language, show the translated and backtranslated versions. For this, the Review class has a function translate:
For language, it has to be backtranslated or translated? if not backtranslate, its translate? I can make it a select between none, backtranslate and translate instead. The checkmark is also disabled if the user chooses no language
@impedaka sorry for late reply.
@hosseinfani I installed numpy 1.20 with python 3.8, but when running src/web/backend/app.py, I get this error
Traceback (most recent call last):
File "app.py", line 17, in <module>
from aml.lda import Lda
File "C:\Users\Qin\LADy\src\aml\lda.py", line 1, in <module>
import gensim
File "C:\Users\Qin\LADy\venv\lib\site-packages\gensim\__init__.py", line 11, in <module>
from gensim import parsing, corpora, matutils, interfaces, models, similarities, utils # noqa:F401
File "C:\Users\Qin\LADy\venv\lib\site-packages\gensim\corpora\__init__.py", line 6, in <module>
from .indexedcorpus import IndexedCorpus # noqa:F401 must appear before the other classes
File "C:\Users\Qin\LADy\venv\lib\site-packages\gensim\corpora\indexedcorpus.py", line 14, in <module>
from gensim import interfaces, utils
File "C:\Users\Qin\LADy\venv\lib\site-packages\gensim\interfaces.py", line 19, in <module>
from gensim import utils, matutils
File "C:\Users\Qin\LADy\venv\lib\site-packages\gensim\matutils.py", line 1030, in <module>
from gensim._matutils import logsumexp, mean_absolute_difference, dirichlet_expectation
File "gensim\_matutils.pyx", line 1, in init gensim._matutils
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
The error is fixed when we have a numpy version greater than 1.20 such as 1.22. However, we need a lower version of numpy to run the translated and backtranslation function
While trying to run r.translate, I ran into an error.
@impedaka so, for nllb (translation and backtranslation), we need 1.20. but for gensim we need 1.22. right? can you find a version of gensim that uses 1.20 for numpy. then we won't have this conflict.
@hosseinfani
Installing a lower version of gensim fixed the issue, but I think I found another issue regarding gensim. It happens when we load the model. I installed gensim 4.2.0 and the latest version is 4.3.1
The translate and backtranslate function worked though
print('translated',tranlated_review )
print("back",backtranslated_review )
print("sem", semantic_similarity )
We might need to change the code or I could be installing the wrong gensim version
@hosseinfani If there's anything I can do to help, let me know! I can't seem to update my fork of the project, so I think I'll have to re-fork the main project
Also, is my name going to be included in the authors section in the READ.ME file? https://github.com/fani-lab/LADy/tree/main#authors
@farinamhz any updates?
Hey @impedaka , I added your name to the authors, Now, as you may know, we are going to deploy the web application, so prior to that, you can get the latest version of the LADy and let me know if there is any problem with the installation and running now. If there is any, I will update it, and once it is ok for you, we will go for the next steps of our web application.
Alright, thank you!
On Thu, Jul 6, 2023 at 8:35 PM Farinam Hemmatizadeh < @.***> wrote:
Hey @impedaka https://github.com/impedaka , I added your name to the authors, Now, as you may know, we are going to deploy the web application, so prior to that, you can get the latest version of the LADy and let me know if there is any problem with the installation and running now. If there is any, I will update it, and once it is ok for you, we will go for the next steps of our web application.
— Reply to this email directly, view it on GitHub https://github.com/fani-lab/LADy/issues/32#issuecomment-1624473957, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASAX7R2K253LQ3W37B4QFDLXO5KTZANCNFSM6AAAAAAWYJ5WLY . You are receiving this because you were mentioned.Message ID: @.***>
Also, please check the new readme, Alice, and let me know if you think any change is needed and can improve it. @impedaka
@farinamhz sorry for the late reply. i ran main.py from the latest code base with no problems. I'd like to hear the next steps :)
README.md looks good. some of the links are invalid like in https://github.com/impedaka/LADy-1#srccmn
dataset | review files (english, chinese, farsi, arabic, french, german, spanish, and all) and results' directory -- | -- semeval-14-laptop | ./output/Semeval-14/Laptop/ 22.0 MB semeval-14-restaurant | ./output/Semeval-14/Restaurants/ 22.2 MB semeval-15-restaurant | ./output/2015SB12/ 53.1 GB semeval-16-restaurant | ./output/2016SB5/ 103 MB toy | ./output/toy.2016SB5/ 64.6 MB https://uwin365.sharepoint.com/sites/cshfrg-ReviewAnalysis/Shared%20Documents/Forms/AllItems.aspx?ga=1&id=%2Fsites%2Fcshfrg%2DReviewAnalysis%2FShared%20Documents%2FLADy%2FLADy0%2E1%2E0%2E0%2Foutput%2Ftoy%2E2016SB5&viewid=4cd69493%2D951c%2D47b5%2Db34a%2Dc1cdbf3a0412 Uwin account required I thinkSample pickle files for a toy dataset: ./output/toy.2016SB5/ABSA16_Restaurants_Train_SB1_v2.xml,
Hey @impedaka, No worries. Thank you very much for the time you put into this task. I will check and change the mentioned links and let you know the next step tomorrow.
Hi @impedaka As we have a problem with the cloud right now. I will take of the issues with the links later.
Meanwhile, can we run the web app on the local host now, or is there still any problem?
I can run the web app on local host, Are we using the same layout as the previous website I made? The website code isn't included in the main project, should I add it or make a new one?
On Thu, Jul 13, 2023 at 9:25 PM Farinam Hemmatizadeh < @.***> wrote:
Hi @impedaka https://github.com/impedaka As we have a problem with the cloud right now. I will take of the issues with the links later.
Meanwhile, can we run the web app on the local host now, or is there still any problem?
— Reply to this email directly, view it on GitHub https://github.com/fani-lab/LADy/issues/32#issuecomment-1635123943, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASAX7R7O56KX2MUSLTW5NO3XQCNXZANCNFSM6AAAAAAWYJ5WLY . You are receiving this because you were mentioned.Message ID: @.***>
I asked so I could run on local host and take a look at the layout, so we can modify it if needed. It would be great if you add it to the existing codeline. @impedaka
I can add it, not all of the functionality is working however. You'll need to run both the backend and frontend at the same time.
In LADy/src/web/backend install Flask and flask_cors run python app.py to run the backend server
install nodejs for the frontend if you havent already In LADy/src/web/frontend npm install npm run dev it'll run on localhost:3000
Sorry if this is complicated. I'm using the models from before, so the backend isn't up to date. I'll make a PR :)
On Thu, Jul 13, 2023 at 9:35 PM Farinam Hemmatizadeh < @.***> wrote:
I asked so I could run on local host and take a look at the layout, so we can modify it if needed. It would be great if you add it to the existing codeline. @impedaka https://github.com/impedaka
— Reply to this email directly, view it on GitHub https://github.com/fani-lab/LADy/issues/32#issuecomment-1635130872, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASAX7R35ETXR7MMLURLNYKLXQCO6BANCNFSM6AAAAAAWYJ5WLY . You are receiving this because you were mentioned.Message ID: @.***>
Hey @impedaka, thank you again for the updates, that is great, and I asked @hosseinfani if we can accept the PR now.
Meanwhile, I will share a google doc for the web app and list whatever we need as modifications or new requirements there so that we can explain and update with @hosseinfani.
Hey @farinamhz
I made some changes to the design. Let me know what you think.
Personally, I think the image would look nicer if it were on the right side and not transparent. I can make a preview if you're interested. I'm still making minor adjustments to the website to make it more user friendly. Update me on the google docs for whatever needs to be done :)
Hey @impedaka, Great, thank you very much for the updates.
@farinamhz
regarding the emoji for the webpage, is it possible that we make it transparent so if we have a different color, like purple in your case or any dark one, we do not have the white area surrounding it?
I will make it transparent :)
regarding google docs, I just shared it with you.
Thank you!
@farinamhz
Good news! I got the backtranslation working without changing the numpy version. Before it didn't work because it relied on a older version of numpy. Downgrading numpy caused other problems with libraries and functions.
I just updated networkx to 3.1 (simalign used previously 2.4) hopefully it doesn't negatively impact anything. So far no errors for me
settings = {'nllb': 'facebook/nllb-200-distilled-600M', 'max_l': 1024, 'device': 'cpu'}
res = r.translate('pes_Arab', settings)
tranlated_review = res[0].get_txt()
backtranslated_review = res[1].get_txt()
semantic_similarity = res[2]
print('translated',tranlated_review )
print("back",backtranslated_review )
print("sem", semantic_similarity )
before I fixed it, this was the error:
Traceback (most recent call last):
File "C:\Users\Qin\LADy-1\venv\lib\site-packages\flask\app.py", line 2213, in __call__
return self.wsgi_app(environ, start_response)
File "C:\Users\Qin\LADy-1\venv\lib\site-packages\flask\app.py", line 2193, in wsgi_app
response = self.handle_exception(e)
File "C:\Users\Qin\LADy-1\venv\lib\site-packages\flask_cors\extension.py", line 176, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "C:\Users\Qin\LADy-1\venv\lib\site-packages\flask\app.py", line 2190, in wsgi_app
response = self.full_dispatch_request()
File "C:\Users\Qin\LADy-1\venv\lib\site-packages\flask\app.py", line 1486, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\Users\Qin\LADy-1\venv\lib\site-packages\flask_cors\extension.py", line 176, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "C:\Users\Qin\LADy-1\venv\lib\site-packages\flask\app.py", line 1484, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Users\Qin\LADy-1\venv\lib\site-packages\flask\app.py", line 1469, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "C:\Users\Qin\LADy-1\src\web\backend\app.py", line 48, in api
res = r.translate('pes_Arab', settings)
File "C:\Users\Qin\LADy-1\src\cmn\review.py", line 67, in translate
translated_obj.aos, _ = self.semalign(translated_obj)
File "C:\Users\Qin\LADy-1\src\cmn\review.py", line 85, in semalign
from simalign import SentenceAligner
File "C:\Users\Qin\LADy-1\venv\lib\site-packages\simalign\__init__.py", line 1, in <module>
from .simalign import EmbeddingLoader, SentenceAligner
File "C:\Users\Qin\LADy-1\venv\lib\site-packages\simalign\simalign.py", line 13, in <module>
import networkx as nx
File "C:\Users\Qin\LADy-1\venv\lib\site-packages\networkx\__init__.py", line 115, in <module>
import networkx.readwrite
File "C:\Users\Qin\LADy-1\venv\lib\site-packages\networkx\readwrite\__init__.py", line 15, in <module>
from networkx.readwrite.graphml import *
File "C:\Users\Qin\LADy-1\venv\lib\site-packages\networkx\readwrite\graphml.py", line 314, in <module>
class GraphML(object):
File "C:\Users\Qin\LADy-1\venv\lib\site-packages\networkx\readwrite\graphml.py", line 346, in GraphML
(np.int, "int"), (np.int8, "int"),
File "C:\Users\Qin\LADy-1\venv\lib\site-packages\numpy\__init__.py", line 305, in __getattr__
raise AttributeError(__former_attrs__[attr])
void this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
I will update you on the design for the new backtranslation and translation function. Let me know your ideas! Also I added an "Augmentation" checkmark, but I dont know how it'll work
Adding different metrics results to the results displayed (we have success@k, but we need map@k, recall@k, ndcg@k, and precision@k, which are all in the output available), it can be a choice after the results are displayed to be changed.
also I'm not sure how to use different metric results. Either I spend more time figuring it out, or you can show me :)
Hey @impedaka, Great! Thank you for letting me know of the issue and update on the fix. __ Nice! I think if we have sth like:
Translated (Farsi): "..." Backtranslated (English): "..."
With a suitable space of more than one line between these two parts, as characters may differ in size.
For the augmentation, I am going to talk with @hosseinfani first, as we should provide different models in this regard. Otherwise, we need to mention all the results will be based on the augmentation, and in that case, we need to remove the checkbox. Anyway, thank you for adding it. I'll update you on this.
I will provide an example of the results in a few minutes for you to tell you about the metrics.
Hi @impedaka,
Welcome to LADy :)
This is an issue page to log your progress in developing the web app.
Please let us know if you have any concerns or questions.
@hosseinfani