cancerberoSgx / node-lucene

node-lucene: (main objective) apache lucene bindings for nodejs (straightforward API, performant thanks to node-java dynamic bindings - no server, process spawning or IPC-like). javap and javap-json : inspect Java AST from .jar and .class. java2js: Research about auto generate TypeScript/JavaScript interfaces and implementations from java .jars and .class
MIT License
24 stars 0 forks source link

Performance benchmarks #5

Open madshargreave opened 5 years ago

madshargreave commented 5 years ago

Very interesting project 👍

Have you done any benchmarks with regards to performance? I'm considering using something like to have feature parity with ES, but from inside a node process and specifically very short lived documents

cancerberoSgx commented 5 years ago

Have you done any benchmarks with regards to performance?

No. The only uncertainty regarding performance is invocation of native java and that's taking care by https://github.com/joeferner/node-java. shortly if you invoke java methods a lot of times - that passing of control between node and java is the only thing that could impact performance. For indexing the impact should be minimal since in general you do it with a single or few calls - so the execution is mostly in the jvm side - just as in lucene java

but for querying, if you need let say to iterate a large result set and in each iteration call a java method - that could impact performance.

Again, I don't know how much. Also I don't know if if the kind/size of objects returned from Java have an also an impact.

I would ask/search here: https://github.com/joeferner/node-java perhaps the project has real benchmarks or some numbers on this regard.

Notice that performance problems like looping big result sets and calling java methods multple times, if real, it could be still solved by writing that loop in actual java code that returns the filtered results so there is only one java method call from node - But that is out of scope right now for this project since it focus on porting lucene APis only and not on customizing lucene or implement java tooling around it .

My objective with this project was to see how much could be accomplished only writing JS code. And I didn't focused on automatic Java code generation/compilation/execution at all - only to detect which scenarios couldn't be supported by only generated JS code.

Also notice the project does't implement all lucene API only a small part to demonstrate basic examples. I'm currently not planning to complete the API in the short term. I would focus on a tool to generate the code automatically from java code instead. Nevertheless if you need something which is not implemented please report an issue and I will add it since it's quite simple and straight forward - mechanic. I also need to write a small tutorial on how to do this... - so in that case don't hesitate to ask for missing apis.

Hope that clarifies things.

cancerberoSgx commented 5 years ago

I'm considering using something like to have feature parity with ES, but from inside a node process and specifically very short lived documents

I didn't quite understand this. Could you please elaborate more on this? for example would be enough just to use lucene query language or do you need to implement custom filters (subclassing lucene's) - also how big the indexes would be? in memory or file system. - I could write some tests using the simple example (similar to the README's but I'm not sure if that's your case.

kkdev163 commented 3 years ago

Hi @cancerberoSgx , I'm in this scene too, I want to index Bible document for search purpose. The total uncompacted document is less than 30M.

I have try to install ES in my application server, but it cost half memory of my machine(1 Core 、2G Memory) , until now it still installed failed.

It seems node-lucene (file system index) will match my purpose . But I'm not sure Dose it support Chinese Tokenization ?

kkdev163 commented 3 years ago

I have try to install ES in my application server, but it cost half memory of my machine(1 Core 、2G Memory) , until now it still installed failed.

Finally ES is successful run in my machine with this enviroment config:

ES_JAVA_OPTS=-Xms256m -Xmx256m