citiususc / Linguakit

Multilingual toolkit for NLP: dependency parser, PoS tagger, NERC, multiword extractor, sentiment analysis, etc.
GNU General Public License v3.0
64 stars 22 forks source link

Information about the performance #15

Closed baquiax closed 4 years ago

baquiax commented 4 years ago

Hello team. First of all I want to congratulate y'all. This project is really awesome.

I'm interested on use this project for my thesis at the college. I expect to release a project that will use Linguakit. My target group would be big. So, I want to ask if in some way you did some kind of stress test on it. Is it feasible to run hundred of, for example, dep analysis.

Thank in advances.

gamallo commented 4 years ago

Hi Alexander, Thank you very much for your interest! Each module has a different behavior in terms of RAM requirements and speed. We didn't make specific tests but we used Linguakit for analysing large text corpora, for instance, several Wikipedias.

For instance, the 'dep' analysis of a text document takes over 2,5% of 8G RAM and analyzes 1,4k words per second (using the processor of my laptop: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz). So, it takes two weeks in syntactically analyzing the entire English Wikipedia (2 billion words) in my laptop.

baquiax commented 4 years ago

wow @gamallo. it's pretty impressive the performance. As I understand due the operations behind the analysis CPU is the most important instead of I/O operations.

About the other modules do exists some place to find the metrics e.g for (rel)

Nice, thank you very much for answered my question.

I think, it would be worth to have this comparatives in the documentation. I'll be glad to contribute at least with it.

baquiax commented 4 years ago

@gamallo

Sorry to ask multiple things. But I'm really love this project and I want to use this to help people learn a Mayan language from Spanish (and better from Mayan to Spanish). :smile:

What is the best way to use Linguakit for example from an HTTP API? I imagine that maybe executing the binary from the other language.

e.g In Golang

cmd := exec.Command("linguakit", "rel", "es", "hola amigo", "-s")
log.Printf("Running command and waiting for it to finish...")
err := cmd.Run()
log.Printf("Command finished with error: %v", err)

I'll appreciate your comments.

baquiax commented 4 years ago

Hi @gamallo, sorry to bother you. I know you probably has more important things to do.

But i'll try, I will really appreciate your comments or observations on my last comment.

Sorry and thank you.

gamallo commented 4 years ago

Sorry for the delay. I'm not sure if I understood your last question, but I'll try to answer. The most efficient way to use Linguakit is to run an API. As it is written in Perl, the most natural way to build an API for all or some of the Linguakit modules is to use Dancer: http://perldancer.org/

On the other hand, as you said, you can also run the 'linguakit' command from any other language that allows access to the command line. This is the easiest way to use Linguakit from another application even if it is not so efficient as having access via API. Hope it helps. Best

baquiax commented 4 years ago

Hi @gamallo, yes it is.

Thanks a lot for your time. Makes sense. I was asking about how to run linguakit from other programming language because I want to make an API using Golang (I don't know too much Perl).

Thanks