deepjavalibrary / djl

An Engine-Agnostic Deep Learning Framework in Java
https://djl.ai
Apache License 2.0
4.12k stars 654 forks source link

Make commons-compress an optional dependency #2949

Closed StefanOltmann closed 9 months ago

StefanOltmann commented 9 months ago

I'm experiencing difficulties with ProGuard and commons-compress. This has led me to consider the possibility of excluding the commons-compress dependency, allowing the application to function smoothly unless a zipped model that requires unzipping is explicitly specified. This approach would enable me to circumvent ProGuard issues associated with commons-compress while reducing unnecessary dependencies.

frankfliu commented 9 months ago

Can you exclude commons-compress from your project if you are not using any .tar file?

This should be pretty easy from gradle:

    implementation platform("ai.djl:bom:0.26.0")
    implementation("ai.djl:api") {
          exclude group: "org.apache.commons", module: "commons-compress"
    }
StefanOltmann commented 9 months ago

Yes, I excluded it this way, but it will crash:

java.lang.NoClassDefFoundError: org/apache/commons/compress/compressors/gzip/GzipCompressorInputStream
    at ai.djl.repository.RepositoryFactoryImpl$LocalRepositoryFactory.newInstance(RepositoryFactoryImpl.java:194)
    at ai.djl.repository.RepositoryFactoryImpl.newInstance(RepositoryFactoryImpl.java:64)
    at ai.djl.repository.Repository.newInstance(Repository.java:90)

The code currently attempts to load the class, even when it is unnecessary.

I recommend redesigning it so that classes from common-compress are not loaded unless they are required for unzipping a model. This can be achieved by consolidating all logic involving commons-compress into a dedicated helper class, which is only invoked when there is a need to decompress something.

I provided the unzipped retinaface.pt. Or is this file still compressed?

StefanOltmann commented 9 months ago

@frankfliu Do you have an estimate when the next release including this fix will be available?

I see you seem to have monthly releases.

frankfliu commented 9 months ago

We just released 0.26.0, 0.27.0 will be in March, see: https://docs.djl.ai/master/index.html#release-notes

For the meantime, you can use our nightly snapshot release: https://docs.djl.ai/master/docs/get.html#nightly-snapshots

StefanOltmann commented 9 months ago

@frankfliu I've tested the latest snapshot version and found that it functions smoothly with ProGuard. There seems to be something off with commons-compress that hinders optimization. I'm glad I don't need that library as long as I'm not using TAR files. Thank you for quickly fixing this!

Considering March is a considerable time away, I was wondering if you could consider releasing a v0.26.1 fix version. I'm a bit hesitant to deploy a snapshot version in a production environment, as it might introduce breaking changes at any moment.

zachgk commented 9 months ago

@StefanOltmann In general, we would rather not to have to do extra releases. They are fairly involved just due to the size of the DJL ecosystem

One option you can take is to use a private build of DJL. If you want to fix to a particular commit of DJL, you can clone it and then run ./gradlew pTML -Pstaging. This will install DJL locally to your ~/.m2/repository directory. Then, you can use this local build as part of building your project following the docs. Just add this as a step in your build process and CI before your main project build.

Alternatively, you can build it once and then upload it to a private Maven repository, which is as easy as saving it in a static HTTP host using something like S3. Then you can pull from that location as part of your build/CI.

Would a workaround like this be viable for you?

StefanOltmann commented 9 months ago

@zachgk Thank you for showing me a workaround. I guess I need to try that then.

I still must say I’m a bit surprised that you have so long and fixed release schedules and that it’s such an effort to do a release. Sounds to me like something is off.

Maybe you should consider reworking your CI pipeline for more automation. I maintain two libraries with Maven Central artifacts and releasing a new one is just a matter creating a GitHub release (which builds everything using GitHub actions and uploads to Maven Central) and confirming in the Sonatype Nexus that the artifacts should be released to Maven Central.

frankfliu commented 9 months ago

@StefanOltmann Releasing DJL is non-trivial work. We have 11 engines and need to test on to following platforms:

For each release, we publish over 70+ artifacts. And a we more than 6 downstream github repo (djl-serving, djl-demo, springboot-start, docs, jupyter notebooks etc) to be updated to new version.

StefanOltmann commented 9 months ago

The artifact count shouldn’t matter with the automation, but I see your point with the downstream repos.