farasapy take a long time

AsmaZbt commented 2 years ago

hello, first, thank you so much for this great job I need to use Farasa segmented and pos tagger in my process, however, it takes about 4 to 5 min to give me the results applied on 20 sentences for example.

how can I optimize this time ?

my process was very quick before I use farasa segmenter and postagger.

do you think using jar can optimize the time?

thank you so much

MagedSaeed commented 2 years ago

Hey @AsmaZbt

Thanks for reaching out. By the way, farasa operates in two modes, the first mode is the interactive mode where sentences are evaluated individually. The second mode is the standalone mode where all sentences are written to a file then got passed to farasa to evaluate.

Both of these modes are already supported by farasa jar files, I am just wrapping it around. I am, under the hood, passing these to the jar files communicating through a command-line process. I am expecting a similar experience if you used the jar files as well.

For your issue, maybe you tried the standalone mode where the interactive mode is the best for you. Interactive mode is the best for short text where speed is preferred over input length while standalone is the best used for long text input.

You may check that also in the colab notebook where both modes are illustrated and compare the time taken for both modes, [https://colab.research.google.com/drive/1xjzYwmfAszNzfR6Z2lSQi3nKYcjarXAW]

Try it and let me know if this did not solve your issue.

AsmaZbt commented 2 years ago

hello @MagedSaeed , Thank you so much for replying and for the explanation. I understand better now. it works much faster thank you so much.

I have another question please: I'm trying to deploy my NLP application using heroka but i can't because of the intialization I think : segmenter = FarasaSegmenter(interactive= True) and pos_tagger = FarasaPOSTagger(interactive=True)

It takes some time ( i read in the doc of farasapy that it should take some time when used for the 1st time) and upload 100%|██████████| 241M/241M [00:21<00:00, 11.1MiB/s].

the error message is: request time is out.

Do you have an idea of how I can process these two initializations separately that I can deploy my app?

Thank you so much

MagedSaeed commented 2 years ago

You are most welcome @AsmaZbt

Really grad that things work for you.

Regarding your concern about Heroku host, make sure, first of all, your application is not commercial, otherwise, take farasa authors' approval to avoid any legal escalations. Now, are you sure the timeout is from farasapy? Does it work with you fine on your local machine? If you used a debugger on your local machine, can you trace how things are going and which process takes time?

Just for sake of curiosity, in the server, where did you install the package? on a virtual environment or globally? If globally, make sure to install it with Python3 environment as Linux systems usually, redirect to Python2 instead of Python3 with pip install farasapy. Maybe you can use pip3 install farasapy or python3 -m pip install farasapy

AsmaZbt commented 2 years ago

Thank you very much, that's very kind of you @MagedSaeed .

I'm trying to publish a paper of research and I cited farasa and your work in my paper, the application give only solution for my problematic, It's not for a commercial use. Do you think I need also an authors' approval in this case?

-The application work fine in my local machine and does not take time, I just must wait for the initialization of farasa segmenter and postagger .

I'm using python 3 in a virtual environment.
I think that downloading of farasa toolkit binaries who cause the error message of time is out. hureko don't wait this time.

I made a screenshot so that you can see Screenshot from 2021-12-05 14-16-41

Screenshot from 2021-12-05 14-20-59

in your opinion, what can I do?

best regards, Asma

MagedSaeed commented 2 years ago

Hey @AsmaZbt

Sorry for the late reply, got busy last couple of days,,

Thanks for citing farasapy along with farasa. I really appreciate,.

Since your application is not commercial, I do not think you need to contact farasa authors as per their license.

Regarding the issue you are facing in your flask app, let me ask you this. Do you have a terminal access to your host (Heroku)? I think you should have. If so, maybe this workaround will be useful. Try to execute this line on your terminal after activating your app virtual environment:

python -c "from farasa.segmenter import FarasaSegmenter;segmenter = FarasaSegmenter()"

This line will initialize the segmenter object downloading the jar files for the first time, if they are not there. After that, all subsequent calls will reuse the same jar files.

Hope this will help,,

AsmaZbt commented 2 years ago

hello @MagedSaeed , I would like to thank you so much for your great work, effort, and help.

I tried your solution , but it didn't solve the problem. its give me 'resources limited'. it's a problem of memory now, I used their free plan.

thank you much. best regards Asma.

MagedSaeed commented 2 years ago

Thanks, @AsmaZbt for your kind writings. Really appreciate,

The size of the binaries is below 250MB I guess. How much storage the free plan is offering?

AsmaZbt commented 2 years ago

Yes, the binaries size is 241MB,

the free plan offer : RAM 512Mb, storage 500Mb.

I using also sentence-BERT-transformer.
I'll give an idea of what I'm doing:

I have a dictionary of words: one definition may have 100 sentences.

I have a query sentence and a verb in this sentence that I would like to disambiguate.

I apply farasa.segmenter on the query sentence.

then I compute the embedding vectors of the query sentences and the 100 sentences of the definition of that verb in the dictionary.

then I compute similarities between them to retrieve the nearest sentences. Then I apply farasa.POStagger on the 10 nearest sentences. then I need the get the list of the most frequent verbs.

These is the BIG lines of my process, I just want to do an online demo for my approach like in this work: http://ltdemos.informatik.uni-hamburg.de/uwsd158/

Thank you really for you.

MagedSaeed commented 2 years ago

That looks interesting! I wish you the best in your research work!

MagedSaeed / farasapy

farasapy take a long time #19