Open pbordron opened 5 years ago
IMHO, I think a way to solve this issue is to replace the file only when needed, not at each environment activation.
This is exactly when the file needs to be updated.
Can you not mount your envs on another file system? NFS is often problematic.
My only suggestion would be to try to minimize these time windows. You could copy the original ldpaths
to e.g. ldpaths.$$
, modify that, then replace ldpaths
with ldpaths.$$
. This could be coupled with a loop to make sure ldpaths
exists at the point of consumption with a limit on the number of times it spins waiting for it to appear.
Please feel free to submit a PR. Personally I cannot reproduce your error since I do not use NFS.
The point is that R CMD javareconf
replaces ldpaths
file each time it is called, even if there is no change (see the script here)
The best solution will be to have a ldpaths
file that is proper to each conda env activation, but it is not so simple.
An easy but yet incomplete solution can be to modify the following patch such as the files are only replaced when the new file is different.
Here a proposition that is haven't tested: https://github.com/pbordron/r-base-feedstock/blob/d6aef51add58ad802f5d04467504134bf31817ac/recipe/0017-Foribly-remove-then-forcibly-mv-in-javareconf.in.patch
The best solution will be to have a ldpaths file that is proper to each conda env activation, but it is not so simple.
It is not so simple at all, in fact it's impossible. There is no such thing as a 'file that is proper to each conda env' because we deliberately support a range of Javas both from conda packages, from the system package manager and 3rd party. I have gone to extensive lengths to make this work.
You've replaced a race condition around ldpaths with one around ldpaths.new, an improvement but my suggestion is still better. You can get the best of both worlds by replacing .new in your patch with .new.$$
Please work on a PR if you want this to work on nfs.
@pbordron it is indeed important that conda packages work on NFS. If I can help you with the PR, please let me know. I won't have the time to work on it on my own, but I would be happy to help in any way. Thanks a lot for pointing this out!
From now, I do not have the time nor the environment for looking for a fix.
The problem does not only append with NFS, but also with BeeGFS.
The fact that the ldpaths
is replaced in two times is the main cause (i.e file ldpaths
is removed and then ldpaths.new
is moved to ldpaths
).
Hello,
I've recently encountered this error in my miniconda3 as well. I have a brand new miniconda3 install due to a corrupt libevent
file last week. In the new install, the error
~/opt/miniconda3/bin/R: line 238: ~/opt/miniconda3/lib/R/etc/ldpaths: No such file or directory
began once I created a new environment. I also use a cluster with CentOS_6 and CentOS_7 nodes, all under LSF. Is there a any other way to fix to this issue other than the patch mentioned above ? Any other ideas on how to fix the issue ?
Many thanks in advance,
@RodrigoGM the patch is not tested and replace a race condition with another.
Hi, We're facing the same issue with our Galaxy instance.
/some_path/_conda/envs/mulled-v1-10760b69799f3df14ac223943c74682453f4c44945480a6c4ee3ccaff1510f7f/lib/R/etc/ldpaths: No such file or directory
I have to reinstall r-base
within the env:
conda install r-base==3.5.1 --force-reinstall
But I don't know how many time it will be up.
So as suggested by @pbordron (we know each other), I edited the bash script
vi /some_path/_conda/envs/mulled-v1-10760b69799f3df14ac223943c74682453f4c44945480a6c4ee3ccaff1510f7f/etc/conda/activate.d/activate-r-base.sh
#!/usr/bin/env sh
#R CMD javareconf > /dev/null 2>&1 || true
It should be harmless in the context of Galaxy since we never stack env. 🤞
(FYI: @fgiacomoni)
I also had this issue, and solved this in a tricky way.
In my other conda env, there was another ldpaths
file, the path like ~/miniconda3/envs/py2.7/lib/R/etc/ldpaths
, I copy this ldpaths
file to where it is needed, and change the path in this file to fit my need, then everything is OK.
I may face this problem again while change conda envs, but at least I could return work ASAP. (I already backup ldpaths
file for emergency)
Looking forward to someone who can give the perfect solution.
I ran into this issue with Snakemake. I use it with --use-conda
flag, and when running parallel jobs with R, at some point ldpaths
file just disappears. Nothing helped me to overcome this yet, really waiting for the solution.
I ran into this issue with Snakemake. I use it with
--use-conda
flag, and when running parallel jobs with R, at some pointldpaths
file just disappears. Nothing helped me to overcome this yet, really waiting for the solution.
Hey I had the exact same problem (running snakelike in parallel) and lecorguille solution helped. Inside the currently used conda directory (should be in the error log or just look for the most recent folder) inside .snakemake/conda/<hashhere>
and edit the file .snakemake/conda/<hashhere>/etc/conda/activate.d/activate-r-base.sh
and comment out the line described. This may not work with your setup, but "worked for me".
From
R CMD javareconf > /dev/null 2>&1 || true
to
# R CMD javareconf > /dev/null 2>&1 || true
Best Regards
@Finesim97 well, I've just commented lines from 416-432 in env/lib/R/bin/javareconf (the ones that are responsible for updating files) and everything works great. It is partially equal to your solution. Thank you!
But what's the point of updating ldpaths
and Makeconf
each activation anyway? I don't think users update their Java that often. Setting values for this files on first installation should be enough. And than maybe we can introduce possibility of manually invoking update? If the user reinstalled Java or something.
@mingwandroid said that activation is
exactly when the file needs to be updated.
But as it can be seen, users just comment out this step altogether.
If you want to query why the R foundation made this choice you should ask them.
I don't think users update their Java that often
That you do not need this doesn't mean others do not. We need a solution that works correctly in all cases (and it's not difficult either). Reconfiguring Java for each env is not going to be undone for it is the correct thing to do. If you convince R upstream to remove javareconf form R altogether (say moving it into rJava itself) then that would also work, however I don't know enough though too say this is a good idea or even possible.
I ran into this issue with Snakemake. I use it with
--use-conda
flag, and when running parallel jobs with R, at some pointldpaths
file just disappears. Nothing helped me to overcome this yet, really waiting for the solution.Hey I had the exact same problem (running snakelike in parallel) and lecorguille solution helped. Inside the currently used conda directory (should be in the error log or just look for the most recent folder) inside
.snakemake/conda/<hashhere>
and edit the file.snakemake/conda/<hashhere>/etc/conda/activate.d/activate-r-base.sh
and comment out the line described. This may not work with your setup, but "worked for me".From
R CMD javareconf > /dev/null 2>&1 || true
to
# R CMD javareconf > /dev/null 2>&1 || true
Best Regards
I am experimenting the same issue... I have not tested your solution, mine was to remove the environment to force it to be reinstalled. I will try yours when I have time. Maybe something for the snakemake devloppers to look at? @johanneskoester ?
So @mingwandroid, it boils down to having something like an environment-wide lock that is acquired before the activate script, ensuring that there are no two activate scripts executed at the same time, right? I would think that this could simply be implemented inside conda activate
. Am I getting this right?
Yes. Making the lock as short lived as possible would be ideal
Great. Is this something that somebody on your side wants to do, or would you rather like me to provide a PR? If it should be me, can you give me a pointer where that would be in the code base? Sorry for asking, I don't want to bother you more than necessary, but my time is very limited these days.
Mine is also limited. Ping @msarahan
@msarahan, what do you think?
I took only a quick look at this. IIUC, the activation script is just to give some users the convenience of having R
"automagically" (re-)configure itself according to the currently available (external) Java every time an environment is activated.
So, okay, this may be beneficial for some users (i.e., those that change their Java configuration after this package has been installed, but don't care to run javareconf
themselves). But frankly, in my opinion, putting this in an activation script is a horrible hack.
My general advice on (de-)activation scripts:
Getting back to the the case at hand: IMO, true solutions to the problem are one of the following:
R
(or whatever is responsible for the R
<-> Java interaction) to not depend on persistent state,I have no idea how R
works (and am not really interested in that anyway), hence I can't comment if 1. is possible / feasible.
To make 2. work, you can either
2.1. Outright remove the javareconf
call in the activation script and tell the users to manage their R
-Java stuff themselves,
2.2. Copy the javareconf
script and replace all of its writes by simple checks for changed files. If a change is detected, output a message to the user to run javareconf
themselves.
Those are disruptive changes, of course. I won't expect 1. to happen. However, 2. is a sane compromise in my book. But since I don't really care much, I won't argue or push for that change.
Adding locks to Conda environments during activation is not an option for me. Having conda
lock environments during its own operation (i.e., install
/update
/remove
actions) makes sense.
Environment activation on the other hand should not mess with persistent state if ever possible, so locking an environment doesn't make sense for this.
If there are some "black sheep" activation scripts like the one of this package, then those scripts should handle concurrent access themselves, not conda
.
Imagine if the javareconf
script didn't change a file inside the Conda environment itself but in $HOME/.config
or the like. If we created two different environments with r-base
and concurrently activate them, then locking either of the environments doesn't help at all because the writes happen at some outside location.
=> conda
cannot know if and where activation scripts change global state => conda
can't know what to lock => activation scripts should handle concurrent writes themselves.
I'll open a pull request to address "8.1. avoid unnecessary writes" from my recommendation list above. NB: The patch at https://github.com/conda-forge/r-base-feedstock/blob/9934a07a64d4115c9e2bbc50129c4b6eb6bd0bb6/recipe/0017-Foribly-remove-then-forcibly-mv-in-javareconf.in.patch#L20 increases the likely hood of this issue appearing.
IMHO while we have no have Java ecosystem to speak of, allowing different ones per env is important, no necessary. We can do better around our locking here but I don't think we should throw the baby out with the bath water.
These are decisions for R upstream. I didn't write R CMD javareconf or come up with the idea, but I believe per env activation is right. Unless you can carefully articulated why not then we're at an impasse on that point. Please try our R with a range of Javas. I went to extreme lengths to maintain good compatibility for our users because we have nothing much for them ourselves.
I'm not so busy on R these days FWIW but I don't think this is the right track to take.
I disagree with your point that activation scripts shouldn't write files beyond accepting they shouldn't exist when not necessary. Writing env local files is appropriate but we should lock things better.
At the end of the day R provides this facility and I thought it there to take advantage of, that this was most appropriate place (think people experimenting with different Javas with R) and we should be as dynamic and friendly to all the Javas as possible and I believe using it is entirely reasonable (if bugged currently .. be aware we have a few openjdks too, conda-forge's coming from Azul still I think and ADs coming from Jetbrains, and I want to support system and legacy Javas).
I'm just trying to broaden our appeal, I don't use Java at all and R little (ironically usually when trying to fix issues with rJava, yes, largely to do with trying to be broadly compatible (and not surprise or burden our users - which Java should be active? How do I do that?). I don't know how much of R's Java goes through rJava. If all, then perhaps someone could propose moving that functionality into rJava itself, I'd be supportive of that if it's appropriate. That we must people could ignore this (and we can make it work right for other packages that much want to write env local lconfig files during activation).
[...] allowing different ones per env [...] [...] but I believe per env activation is right [...]
I don't disagree with the "per env" part, just the "activation" one. My stance is just that reconfiguration during every environment activation is excessive and can lead to concurrency issues during writes (i.e., this very issue). I'd rather have those changes made on demand, which means when r-base
is installed/updated (which is already the case via the post-link script) or by the user.
I went to extreme lengths to maintain good compatibility for our users because we have nothing much for them ourselves.
Glad to hear, I'll just take your word on this ;).
Please try our R with a range of Javas.
I'll leave this to others since I'm with you with "I don't use Java at all and R little" -- for me it is rather "I don't use R at all and Java little" ;).
I disagree with your point that activation scripts shouldn't write files beyond accepting they shouldn't exist when not necessary.
The thing is just that I believe users likely won't expect environment activations to do much special things, esp. not things that might fail. Plus, authors of activation scripts have to be aware of possible concurrency issues and handle them accordingly.
Writing env local files is appropriate but we should lock things better.
IMO, only in exceptional circumstances for activation scripts. And due to the "exceptional" part, I'd rather not have conda
itself deal with this (rather just have the activation script at least do something like lockfile "${CONDA_PREFIX}/path/to/file.lock" ; write-to-file "${CONDA_PREFIX}/path/to/file" ; rm -f "${CONDA_PREFIX}/path/to/file.lock"
).
Overall my opinion is just
2.2. Copy the
javareconf
script and replace all of its writes by simple checks for changed files. If a change is detected, output a message to the user to runjavareconf
themselves.[...] is a sane compromise in my book. But since I don't really care much, I won't argue or push for that change.
and hence I think for now we should just try to reduce the concurrent write situation here and, if someone is willing to dig into it (i.e., not me), add some proper file locking in javareconf
or the activation script.
On gitter people are expecting to be able to operate on the same conda env from multiple processes at the same time and that's what is happening here. I'd rather we fixed it higher level. In no way should an env been mutable by two conda at once. I'd this bug helps us to shake out issues there then that's a win.
But yes, happy for wanting reasonable here too, just want to be flexible and am very concerned about UX and compat.
Moving the conversation from gitter over here. My points were:
conda install
or other operations obviously altering an environment OTOH is expected to either break (current behavior), fail or block. It is also reasonably easy to avoid this. It would be awesome to have locking for package cache and env to allow safe concurrent calls to those, but there is no way that two things are adding packages to the same env at the same time. sqlite3
discourage accessing the same database from multiple hosts. So using locking needs to come with big warning signs.I would therefore argue that conda activate
should be free of side effects.
If things specific to the local host really really really need to be altered, the respective configuration would have to be placed in a temporary directory unique to the activation session. We'd also have to accept that it would not be reliably cleaned away in all cases.
Using locking to protect from things changing underneath running processes (which would lead to undefined results) would require acquiring the lock in activate
and releasing it in deactivate
. The latter doesn't necessarily happen explicitly though, so that's the first major problem. The other is that the env can then no longer be used in parallel, meaning a cluster job running 50 nodes would need 50 copies of the env. Possible, but a very ugly brute force solution.
In the case at hand, I don't see why host local JVMs need to be supported at all. IMO, the root package requiring the javareconf files should require a JVM installed into the environment and handle the ldpath setup in post-install.sh
. That way, it's set up at the time where modifying the env is expected, and the env remains static during use.
In summary, my $0.02 based on experience parallelizing C/C++ and Python apps is that activate
should be "const
", i.e. idempotent and without side-effects. Everything else will either be slow (=one process per env) or turn out to be a nightmare with constant breakage and patch after patch as more and more corner cases turn up.
If environment mutation is locked which I think it should be and am not sure if that's a bug or a missing feature, then that would include activation (because we've always allowed env mutation, and this problem can happen with anything, env vars for example).
If we fix all that then this is automatically fixed. Though @mbargull's PR will help in the meantime.
I would really like if activation were assumed to be "const." I agree with Ray, though, that we have the existing expectations to contend with. Perhaps with https://github.com/conda/conda/pull/8727 we can have a "safe" way of activation: if any arbitrary shell scripts are present, it is considered unsafe (mutation may happen), and must be locked. If no unsafe scripts are present, though, conda will not require locking for activation, and perhaps can go faster.
Using locking to protect from things changing underneath running processes (which would lead to undefined results) would require acquiring the lock in activate and releasing it in deactivate
@epruesse, it absolutely would not.
I was wondering, would it be possible to give an update re (the resolution of) this issue?
Thank you very much in advance
I hit this problem when hundreds of processes tried to activated a single read only conda environment and spawned hundreds of Java processes, which ended up consuming system resources.
I have a suggestion for this situation which follows up on @mbargull 's second point:
Let the user take care of running `javareconf.
And it is least disruptive to the current setup, because it doesn't change the the status quo.
In the activation script, introduce an env var like CONDA_R_IGNORE_JAVARECONF
. If this is set, then the activation script shouldn't run R CMD javareconf > /dev/null 2>&1 || true
.
That way end users of clusters (where environments don't change that often or have nothing to do with Java) can simply export this env var and activation with be quicker and there will be no mayhem.
Dear all, I also run into this issue frequently when processing a lot of files in parallel using snakemake with--use-conda (on an NFS share). I see that there is a fix: https://github.com/kpalin/r-base-feedstock/commit/9eda35bdc8ea2c2433cbc6b94c2e978b4d7cd8d4. But since this issue is still open, does that mean it is not in any release yet?
My apologies for my ignorance, I'm not able to determine if that commit is in any release...
By the way, setting my conda envs as read only would be an option for me (actually I'd prefer it, and keep their management restricted to 1 or 2 accounts only).
Issue:
I get the following error when using some conda R environment with galaxy or snakemake in HPC infrastructure:
After tracking the issue, it appears that activating the conda environment with R runs the command
R CMD javareconf
that updates the file$CONDA_ENV/lib/R/etc/ldpaths
.I get this issue in two different cluster infrastructures:
In both case, conda envs are stored on NFSv3 or BeeGFS shares mounted on computation nodes.
Concurrent activations like it can append during a galaxy school or with some snakemake wokflows with network latency lead to two cases:
$CONDA_PREFIX/lib/R/etc/ldpaths
doesn't exists during a short gap of time and another activation appends during this gap producing this error for some jobs.$CONDA_PREFIX/lib/R/etc/ldpaths
doesn't exists anymore. One way to solve the issue is to reinstall the r-base package.IMHO, I think a way to solve this issue is to replace the file only when needed, not at each environment activation.
Regards
Philippe
Environment (
conda list
):Details about
conda
and system (conda info
):