Closed AsmaZbt closed 2 years ago
Hey @AsmaZbt
Thanks for reaching out. By the way, farasa operates in two modes, the first mode is the interactive mode where sentences are evaluated individually. The second mode is the standalone mode where all sentences are written to a file then got passed to farasa to evaluate.
Both of these modes are already supported by farasa jar files, I am just wrapping it around. I am, under the hood, passing these to the jar files communicating through a command-line process. I am expecting a similar experience if you used the jar files as well.
For your issue, maybe you tried the standalone mode where the interactive mode is the best for you. Interactive mode is the best for short text where speed is preferred over input length while standalone is the best used for long text input.
You may check that also in the colab notebook where both modes are illustrated and compare the time taken for both modes, [https://colab.research.google.com/drive/1xjzYwmfAszNzfR6Z2lSQi3nKYcjarXAW]
Try it and let me know if this did not solve your issue.
hello @MagedSaeed , Thank you so much for replying and for the explanation. I understand better now. it works much faster thank you so much.
I have another question please: I'm trying to deploy my NLP application using heroka but i can't because of the intialization I think : segmenter = FarasaSegmenter(interactive= True) and pos_tagger = FarasaPOSTagger(interactive=True)
It takes some time ( i read in the doc of farasapy that it should take some time when used for the 1st time) and upload 100%|██████████| 241M/241M [00:21<00:00, 11.1MiB/s].
the error message is: request time is out.
Do you have an idea of how I can process these two initializations separately that I can deploy my app?
Thank you so much
You are most welcome @AsmaZbt
Really grad that things work for you.
Regarding your concern about Heroku host, make sure, first of all, your application is not commercial, otherwise, take farasa authors' approval to avoid any legal escalations. Now, are you sure the timeout is from farasapy? Does it work with you fine on your local machine? If you used a debugger on your local machine, can you trace how things are going and which process takes time?
Just for sake of curiosity, in the server, where did you install the package? on a virtual environment or globally? If globally, make sure to install it with Python3
environment as Linux systems usually, redirect to Python2
instead of Python3
with pip install farasapy
. Maybe you can use pip3 install farasapy
or python3 -m pip install farasapy
Thank you very much, that's very kind of you @MagedSaeed .
I'm trying to publish a paper of research and I cited farasa and your work in my paper, the application give only solution for my problematic, It's not for a commercial use. Do you think I need also an authors' approval in this case?
-The application work fine in my local machine and does not take time, I just must wait for the initialization of farasa segmenter and postagger .
I made a screenshot so that you can see
in your opinion, what can I do?
best regards, Asma
Hey @AsmaZbt
Sorry for the late reply, got busy last couple of days,,
Thanks for citing farasapy along with farasa. I really appreciate,.
Since your application is not commercial, I do not think you need to contact farasa authors as per their license.
Regarding the issue you are facing in your flask app, let me ask you this. Do you have a terminal access to your host (Heroku)? I think you should have. If so, maybe this workaround will be useful. Try to execute this line on your terminal after activating your app virtual environment:
python -c "from farasa.segmenter import FarasaSegmenter;segmenter = FarasaSegmenter()"
This line will initialize the segmenter object downloading the jar files for the first time, if they are not there. After that, all subsequent calls will reuse the same jar files.
Hope this will help,,
hello @MagedSaeed , I would like to thank you so much for your great work, effort, and help.
I tried your solution , but it didn't solve the problem. its give me 'resources limited'. it's a problem of memory now, I used their free plan.
thank you much. best regards Asma.
Thanks, @AsmaZbt for your kind writings. Really appreciate,
The size of the binaries is below 250MB I guess. How much storage the free plan is offering?
Yes, the binaries size is 241MB,
the free plan offer : RAM 512Mb, storage 500Mb.
I using also sentence-BERT-transformer.
I'll give an idea of what I'm doing:
I have a dictionary of words: one definition may have 100 sentences.
I have a query sentence and a verb in this sentence that I would like to disambiguate.
I apply farasa.segmenter on the query sentence.
then I compute the embedding vectors of the query sentences and the 100 sentences of the definition of that verb in the dictionary.
then I compute similarities between them to retrieve the nearest sentences. Then I apply farasa.POStagger on the 10 nearest sentences. then I need the get the list of the most frequent verbs.
These is the BIG lines of my process, I just want to do an online demo for my approach like in this work: http://ltdemos.informatik.uni-hamburg.de/uwsd158/
Thank you really for you.
That looks interesting! I wish you the best in your research work!
hello, first, thank you so much for this great job I need to use Farasa segmented and pos tagger in my process, however, it takes about 4 to 5 min to give me the results applied on 20 sentences for example.
how can I optimize this time ?
my process was very quick before I use farasa segmenter and postagger.
do you think using jar can optimize the time?
thank you so much