Open mkitti opened 1 year ago
I propose we install this in $CONDA_PREFIX/opt/maven/conf/settings.xml
:
<settings xmlns="http://maven.apache.org/SETTINGS/1.2.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.2.0 https://maven.apache.org/xsd/settings-1.2.0.xsd">
<localRepository>${env.CONDA_PREFIX}/opt/maven/repository</localRepository>
<profiles>
<profile>
<id>conda-user-home</id>
<activation>
<activeByDefault>true</activeByDefault>
</activation>
<repositories>
<repository>
<id>userHome</id>
<name>User Home Repository</name>
<releases>
<enabled>true</enabled>
<updatePolicy>always</updatePolicy>
<checksumPolicy>warn</checksumPolicy>
</releases>
<snapshots>
<enabled>true</enabled>
<updatePolicy>never</updatePolicy>
<checksumPolicy>warn</checksumPolicy>
</snapshots>
<url>file://${user.home}/.m2/repository</url>
</repository>
</repositories>
</profile>
</profiles>
</settings>
@conda-forge/maven Is anyone closely invested in maven keeping its settings.localRepository
at ${user.home}/.m2/repository
or in anything else with the package before I charge ahead.
You can override this and change it back to your HOME local repository by putting this in your ~/.m2/settings.xml
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 https://maven.apache.org/xsd/settings-1.0.0.xsd">
<localRepository>${user.home}/.m2/repository</localRepository>
</settings>
@mkitti most of my usage these days would really benefit from defaulting to keeping the m2 repo within the conda environment, so this is awesome. I suspect @frauzufall will be interested in this as well.
I guess most folks who are doing day-to-day Java aren't using conda's maven to drive their IDE builds, so they just have to either deal with adjusting the path or duplicate files if they need to use this packaging of maven.
Will this help make it trivial to package conda environments with pre-loaded local maven repos?
@kephale thanks for the feedback. What I'm really missing at the moment is the ability to declare dependencies between non-Java and Java components.
One of the biggest non-Java components is the OpenJDK itself. Other components are things like compression codecs, Python, HDF5, etc.
I am also thinking of still using JavaCPP to build Java Native Interface bindings for Java before the foreign API.
I have mixed feelings about this change. It is probably correct, in that: A) conda environments are supposed to stay as encapsulated as possible; and B) the mvn command is not considered multi-process-safe when using the same local repo cache.
However, there is a major downside: huge network traffic and wait time and disk usage increase when using multiple conda environments. And we will use multiple environments in our community: my plan is for Appose + conda to serve as connecting tissue between Fiji plugins that leverage otherwise-incompatible codebases. If you make this change, users will be waiting a lot more, needlessly IMHO, to download the same JARs repeatedly.
Asking users to configure their settings.xml
is IMHO not acceptable, since 99+% of people will use the defaults we provide in this context.
I am considering enhancing jgo to: A) use cjdk or install-jdk to download JDKs on demand; and B) use mvnw or some other Maven-bootstrapper to get Maven installed. Once that works, scyjava wouldn't need to depend on the openjdk nor maven packages in conda-forge anymore. So, if you need to make this change, I won't fight too hard, but I will change jgo so the change becomes irrelevant to Fiji/ImageJ's Appose+conda-based logic. Unless I am missing something here...?
However, there is a major downside: huge network traffic and wait time and disk usage increase when using multiple conda environments. And we will use multiple environments in our community: my plan is for Appose + conda to serve as connecting tissue between Fiji plugins that leverage otherwise-incompatible codebases. If you make this change, users will be waiting a lot more, needlessly IMHO, to download the same JARs repeatedly.
Why would multiple conda environments create huge network traffic?
pkgs
cache to populate the maven repository via hard links. There would be no additional disk usage or network usage in this case. The primary need here is when someone is trying to use conda to install a conda packaged Java dependency that has a non-Java dependency.jgo
configures it's own maven repository to the user home repository and has it's own configuration file at the moment. https://github.com/scijava/jgo/blob/2d98c803a3d30cc286876a6cb750b3fbc73dfb83/src/jgo/jgo.py#L433Will this help make it trivial to package conda environments with pre-loaded local maven repos?
Yes, this is the primary advantage of doing this. We can use conda packages to populate a maven repository as well their needed external dependencies simultaneously.
My concern was with each environment's copy of Maven pulling in its own copy of all JAR files requested. Unlike conda packages, these would not be hard linked. This seems wasteful, especially compared to how conda packages are handled by conda itself. Would it not be ideal if release versions of Maven artifacts were downloaded and cached in one single place independently of environment, analogously to how conda uses pkgs
for its package cache?
That said, you are right that for the Appose use case I outlined above—Java main process + Python-inside-conda child processes—this should in fact not be an issue, because in the typical case, Fiji would not be creating environments that include openjdk nor maven.
Unfortunately, at the moment, since we haven't finished solving named shared memory from Java yet, the demo I made back in February uses a Python parent process with embedded Java via PyImageJ, and a Python child process with embedded Java via PyImageJ, and each of these embedded Javas uses jgo to load ImageJ2. With the change you are proposing, I was concerned that multiple copies of ImageJ2 would be downloaded, which would be suboptimal. However, you are also correct that jgo explicitly sets M2_REPO
to ~/.m2/repository
by default, which I forgot about, so maybe there are no problematic cases for my applications after all.
Appose also supports Java main + Java children, as well as Python main + Java children, but for these cases the issue may also be moot: the way Appose is currently coded, Java children are invoked via Groovy, and dependencies are pulled down by Groovy @Grab
annotations, which typically stores JARs into ~/.groovy/grapes
IIRC. These Java children would not even need to live inside conda environments, so this change would also not affect these cases.
My only remaining concern then is that it sounds like you are wanting to move toward packaging Java JAR files as conda packages? I think I already aired my opinion on this, but I think that is a big can of worms that should probably not be opened if it can be avoided. I haven't seen a case where you need to do that. For things like libblosc, you can ship the native libs via conda, and then load them from Java using System.loadLibrary
without needing the Java part to be packaged in conda. I can appreciate the elegance of having that Java code packaged in conda and depending on blosc, but it seems like way more trouble than it's worth, given that it is technical feasible to address the dependency without doing the packaging of Java code in this way. Just ship an environment.yml
with your Java code and let Appose (or whatever) construct the environment for the needed natives and call it good.
TL;DR: Sorry for the (mostly) noise.
There is some chance that we may need to patch some Java packages at build time to function properly within a Conda environment depending on their native library loading mechanism.
For example, consider the case of JBlosc and the need for a HDF5 Blosc plugin for JHDF5 and JavaCPP-HDF5. Preferably, these would all need access to a common Blosc library and a common HDF5 library. Having more than one of these loaded into a single process can be problematic. Rather we might want to embed a configuration for all of these to use the libraries installed by conda rather than vendoring the libraries from within the JAR files in some cases. However, each uses an independent mechanism to locate these libraries. Within FIJI, we do have a mechanism to point them at a common library.
In this case, where we have Java code specialized for a conda environment, we would want those packages isolated within the conda prefix. We may also want those to be accessible to maven within that same conda environment.
Would it not be ideal if release versions of Maven artifacts were downloaded and cached in one single place independently of environment, analogously to how conda uses pkgs for its package cache?
Note that there are two pieces of XML I have posted above.
The first configures the local repository where maven will look for local packagss.
<localRepository>${env.CONDA_PREFIX}/opt/maven/repository</localRepository>
The second part is to tell maven to look at the user's home repository as well.
<repositories>
<repository>
<id>userHome</id>
<name>User Home Repository</name>
<releases>
<enabled>true</enabled>
<updatePolicy>always</updatePolicy>
<checksumPolicy>warn</checksumPolicy>
</releases>
<snapshots>
<enabled>true</enabled>
<updatePolicy>never</updatePolicy>
<checksumPolicy>warn</checksumPolicy>
</snapshots>
<url>file://${user.home}/.m2/repository</url>
</repository>
</repositories>
The second part is to tell maven to look at the user's home repository as well.
This is a clever hack, but it does not cause the conda mvn
to actually store newly downloaded things into a common location. It will only prevent re-download of things that the user already has in their user directory already. Most users will not have things there, so this will not save download bandwidth except for developers who are doing builds outside of conda.
The first configures the local repository where maven will look for local packagss.
Not only where maven will look (read), but also where it will cache (write) them.
I'm starting to think about if it would make sense to invert the two repositories.
1) The only thing that writes to $CONDA_PREFIX/opt/maven/repository
is conda
.
2) If mvn
is installed by conda
then it will use $CONDA_PREFIX/opt/maven/repository
as a repository for reading.
3) If conda-forge package builders use mvn
during build, they should configure mvn
to write to $CONDA_PREFIX/opt/maven/repository
4) If the end user uses mvn
, then it will by default use ${user.home}/.m2/repository
@mkitti Nice idea! That could be really effective at achieving your goal of shipping some JARs with conda packages, while minimizing download duplication during normal Maven usage.
Solution to issue cannot be found in the documentation.
Issue
Currently, if a user uses maven the default local repository will be
<localRepository>${user.home}/.m2/repository</localRepository>
https://maven.apache.org/settings.html#settings-details
Rather the repository should live In the
CONDA_PREFIX
. I propose the following location.<localRepository>${env.CONDA_PREFIX}/opt/maven/repository</localRepository>
This could be added to
opt/maven/conf/settings.xml
.Having the local repository in the CONDA_PREFIX would allow us to create packages that populate the repository with maven packages.
We could add the user's
${user.home}/.m2/repository
added as an internal repository.Installed packages
Environment info