ayld / Facade

Facade significantly minimizes the space used by your project's /lib
ayld.github.io/Facade
GNU General Public License v2.0
3 stars 3 forks source link

Should we fix the IOExcpetion on huge libs ? #43

Open ayld opened 10 years ago

ayld commented 10 years ago

Problem

java.io.FileNotFoundException: /home/ayld/.m2/repository/.../aether-util-1.7.jar (Too many open files)

This happens when you run Facade on a lib folder with a number of jars larger than the OS's ulimit. Which is not that rare actually. A local Maven repo on a machine seen several projects can have well over a 1000 jars.

On my linux:

ayld@pwn:~/m2/repository$ find . -name '*.jar' -type f | wc -l 1533

and

ayld@pwn:~/m2/repository$ ulimit -n 1024

Cause

The thing is that Java's ZipFile type actually opens the file when you create an instance (!!!) even if you don't use it later on.

So this loop that looks like it's searching for files is actually opening them:

for (File jarFile : Files.in(libDir).withExtension(JarMaker.JAR_FILE_EXTENSION).list()) {
    libJars.add(new JarFile(jarFile));
}

because of:

public JarFile(File file) throws IOException {
        this(file, true, ZipFile.OPEN_READ);
}

and

public ZipFile(File file, int mode, Charset charset) throws IOException
    {
        if (((mode & OPEN_READ) == 0) ||
            ((mode & ~(OPEN_READ | OPEN_DELETE)) != 0)) {
            throw new IllegalArgumentException("Illegal mode: 0x"+
                                               Integer.toHexString(mode));
        // omitted
        jzfile = open(name, mode, file.lastModified(), usemmap);
        // omitted 
    }

Why JarFile and not just File ?

Because it actually makes sure that the file is a jar and is singed properly (if needed).

Fixing

So is 1533 libs a valid case ? A user can easily work around this limitation if he backs up repo->builds his last project->runs Facade again or setting his ulimit higher (which is simple).

Should we start using File over JarFile or maybe out own JarFile implementation that doesn't open the files or opens->validates->closes them ?

Or some other fix ?

amaranthius commented 10 years ago

Generally, it would be better not to depend on explicit user actions and manage this case internally. Additionally, it doesn't make sense to open a file when it is known that it won't be used, so modification/custom implementation is in order. On the other hand, modifying core classes is not the wisest thing that one can do. So maybe we need something in the middle - handle the case internally but in terms of our own code OR some external lib/tool. Am i making any sense?

ayld commented 10 years ago

If I understand you correctly you're leaning towards fixing it ?

As I see it we can either:

Either of those is fine IMHO if we all agree to it.

Or do you see some other fix I'm missing ?

amaranthius commented 10 years ago

Definitely fixing it, yes. As to which approach to choose - i don't know at the moment but I will surely investigate and dig deeper into the matter.

tishun commented 10 years ago

Guys,

IMHO we should focus on what we want to achieve when building up the list. I looked at the code and I saw that we only use it as a carrier - once it is sent to one of the Exploders then we actually use the JarFile methods - before that it is only used as a carrier.

I believe we should use the File class instead of the JarFile class when making the list. We then make a JarFile in the Exploders and we still have the verification in place. In case some religious extremist has .jar files that are not really java archives - we can skip their processing (and perhaps log a message). No need to do the check one layer above.

Building up a List of JarFiles is simply insane, given the nature of these objects ...

Watcha think?

ayld commented 10 years ago

The problem as I see it is that, when we start using the JarFile inside the exploders it will still open all of them and keep them open because JarFile closes it's streams only on GC (dafuq?) and since we can't predict GC this approach can leave the problem unresolved.

Another thing however is that HotSpot might actually be smart enough to perform GC once all OS file handles are used, if so, this will work, but we have to test it first. I might have some time to try it today.

private class ZipFileInputStream extends InputStream {
    // ommitted
    protected void finalize() {
        close();
    }
}
Magiosnik commented 7 years ago

Can JarInputStream be used instead of JarFile?

ayld commented 7 years ago

Most probably yes :) Try and see ? :)

The idea behind the File -> JarFile set conversion is that JarFile's API provides the methods we need later on (namely iterating by JarEntry), it also guarantees that a file is a jar, not some other file or archive.

We can use any other class (including ones we write ourselves) that provides the same functionality and doesn't open the file on creation.

It would still be good to check early whether the files we consider "libs" are actually jar files though. So if we switch to something other than JarFile we should make this check ourselves if it doesn't take too much time. If it does take too much time we should at least warn the user (somehow...) that he might expect undefined behavior if he/she provides anything other than an actual jar file.