Codarama / diet-engine

Diet significantly minimizes the space used by your project's /lib
codarama.github.io/Diet
MIT License
3 stars 2 forks source link

Improve resolution speed. #6

Closed ayld closed 9 years ago

ayld commented 10 years ago

The current resolution algorithm is extremely basic, since it was just for PoC and thus slow :) It can be improved by adding jar extraction results to a fast in memory DB (preferably graph) and doing some type of BFS on it.

ayld commented 10 years ago

Please propose DBs and overall architecture on this @Codarama/diet. I'm currently thinking of Neo4j . Although I'm not sure that the cost starting it up will be faster than the current algorithm.

My current idea is to make a separate thread start it while the jar extraction is running and searching it after the extractions is done.

I'll appreciate ideas on this though.

ayld commented 10 years ago

STEFFI might be what we need. Will try it and update :)

ayld commented 9 years ago

Found something interesting introduced in Java 8.

ayld commented 9 years ago

Added a util using the jdeps tool.

I plan to make a class dependency resolver using the util, then compare output with our current resolver.

If it is faster (and it will be) I'll make a new Minimizer implementation for the new resolver.

ayld commented 9 years ago

OK so, since I've been trying to use the new Java 8 jdeps tool to speed things up, I have some new observations and so far I seem to believe this is not the way to go, here's why:

The old manual Minimizer

The way the current slow minimizer works is:

  1. Unzip all lib jars
  2. Parse sources to get dependencies
  3. Recursively look for the classes from 2. in the unzipped jar directories
  4. Bundle the jars

Now, during 3. on every recursive "iteration" we reduce the dependencies we look for. This is easy because I can keep a set of all the .class files in the extracted directories, and whenever I find a dependency to add to the result I remove it from the set I keep. So on every iteration the set to look in becomes smaller.

The thing is this can't be done with jdeps.

The jdeps Minimizer

The good thing about jdeps is that it doesn't need to extract the jars. It just streams their content, which is much faster since as we know from this profiling run (and several others ...) that the slowest part of the minimization is the jar extraction.

Jdeps requires 2 arguments to return dependencies:

So you have to call it like this: jdeps -v -cp some.jar another.jar com.company.ClassToLookFor

You have probably spotted the problem already, but the thing is we can't reduce the set we look in. Every time for every class we'll look in all the jars. Which is not optimal, to say the least.

What is the best way

No idea :)

So far my observations are leading me to believe that going for the optimal solution is the best (as it exists algorithmically), although it will require the most change if we need to implement it.

We need the jar contents indexed, the indexing itself might be slow, but the queries in a proper tree will be lightspeed fast. Practically instant.

So I'm thinking of finishing the jdeps minimizer and comaprng result to the current one, although I'm almost certain it will loose. I'll post result in here once I've got them.

On the part with the index, I'm looking into JCR implementation as I've used Jackrabbit before, I don't think it's the way to go as it's really old and sluggish. Ideally I think we'd need an in-memory index and Jackrabbit doesn't support that. ModeShape is looking promising so far, though.

Thoughts?

ayld commented 9 years ago

Also if we go with a JCR implementation Jackalope, will be good for testing it.

ayld commented 9 years ago

Removed the jdeps minimizer and related code. It's just too slow. Will start the indexing minimizer with ModeShape and see how that turns out.

ayld commented 9 years ago

I managed to get ModeShape working and it is BLAZING fast.

Indexes 3 huge jars (about 10k jar entries including folders and classes) to a tree in UNDER A SECOND @.@

We can (and do in unit tests) run Xpath queries over the tree which are again extremely fast. It looks to be about ~100 times (yes I'm not kidding 100 TIMES) faster than the old manual minimizer.

I just had to post this ... :D

tishun commented 9 years ago

Niiiice!!!

ayld commented 9 years ago

22:32:12 INFO IndexedMinimizationStrategyTest:103 - indexed strategy finished in: 59305 millis 22:39:27 INFO IndexedMinimizationStrategyTest:117 - manual strategy finished in: 434810 millis

This is self explanatory I believe :D

ayld commented 9 years ago

I still want to fix some more things before I commit this ... but yes it's about 7 times faster.