bodkan / slendr

Population genetic simulations in R 🌍
https://bodkan.net/slendr
Other
54 stars 5 forks source link

library(slendr) throwing error after installing #110

Closed RishiDeKayne closed 1 year ago

RishiDeKayne commented 2 years ago

Hi there, I am trying to use slendr and although the installation using install.packages("slendr") seems to work fine when trying to load the library using library(slendr) I get the following error:

/var/folders/3w/40_wp_9x0dj0p009p5lzcrmr0000gn/T//RtmpvTcdHb/file13afa4e0aab4d.sh: line 3: 80781 Killed: 9               'python' -c "import os; print(os.environ['PATH'])"
Error: package or namespace load failed for ‘slendr’:
 .onAttach failed in attachNamespace() for 'slendr', details:
  call: Sys.setenv(PATH = new_path)
  error: wrong length for argument
In addition: Warning message:
In system2(Sys.which("bash"), fi, stdout = if (identical(intern,  :
  running command ''/bin/bash' /var/folders/3w/40_wp_9x0dj0p009p5lzcrmr0000gn/T//RtmpvTcdHb/file13afa4e0aab4d.sh' had status 137

I am running R version 4.2.1 (2022-06-23) and from what I can find this likely relates to reticulate doing something odd (a similar error is discussed here: https://github.com/rstudio/reticulate/issues/1155) but I cant seem to figure out a work around - any advice would be greatly appreciated, Rishi

bodkan commented 2 years ago

Hello,

This is a strange bug. I don't think I've seen this in the time I've spent with reticulate.

The Killed: 9 suggests that your Python is getting killed by your system.

What system do you use?

I'm not entirely sure how to proceed from this, because the only piece of information from this error is that (presumably) some reticulate script is getting killed. I have no idea what the sh shell script on the first line is supposed to be. Nothing in slendr is a shell script, so it must be something internal to reticulate.

Can you check that you can run some code from the tutorial to reticulate? For instance, something involving creating an environment via reticulate like this?

At least that way we will know what the problem is and if it really is reticulate, you could follow up with reticulate folks.

RishiDeKayne commented 2 years ago

Running code from the reticulate tutorial did not throw any errors but I now found two other examples of people reporting the same/similar issues: [https://github.com/rstudio/reticulate/issues/1218] and [https://github.com/rstudio/rstudio/issues/11263] - it seems the reticulate folks aren't responding to any issues unfortunately. The error throwing the file path to the .sh script is also weird because this .sh script does not exist but the tmp folder e.g. RtmpboMRWa does seem to be linked to my installation of slendr as you can see here:

> install.packages("slendr")
trying URL 'https://cran.rstudio.com/bin/macosx/big-sur-arm64/contrib/4.2/slendr_0.3.0.tgz'
Content type 'application/x-gzip' length 2736189 bytes (2.6 MB)
==================================================
downloaded 2.6 MB

The downloaded binary packages are in
    /var/folders/3w/40_wp_9x0dj0p009p5lzcrmr0000gn/T//RtmpboMRWa/downloaded_packages

> library(slendr)
/var/folders/3w/40_wp_9x0dj0p009p5lzcrmr0000gn/T//RtmpboMRWa/file3a92c1f01e4.sh: line 3: 15055 Killed: 9               'python' -c "import os; print(os.environ['PATH'])"
Error: package or namespace load failed for ‘slendr’:
 .onAttach failed in attachNamespace() for 'slendr', details:
  call: Sys.setenv(PATH = new_path)
  error: wrong length for argument
In addition: Warning message:
In system2(Sys.which("bash"), fi, stdout = if (identical(intern,  :
  running command ''/bin/bash' /var/folders/3w/40_wp_9x0dj0p009p5lzcrmr0000gn/T//RtmpboMRWa/file3a92c1f01e4.sh' had status 137
bodkan commented 2 years ago

Thanks for reporting with more details.

This is incredibly frustrating.

On one hand, I'm glad this is a reticulate bug and not a slendr bug, also because there's a higher chance someone from the reticulate team will eventually jump in and fix this. Also, it's clearly not just you having this issue, making it even more likely that a solution will eventually pop up.

It does appear as if a reticulate X conda interaction could be a problem here?

How comfortable are you with R and Python things from a more technical perspective?

One solution might be to drop conda altogether and try basic Python virtualenv solution. (I don't like conda anyway and use it in slendr only because it works also on Windows... despite not being a Windows user).

Basically, you could try to create a normal Python virtual environment and try linking your R session to that, before loading slendr.

One way is the standard python -m venv ... as described here. Alternatively, if you don't use/like Python, this can be also done in R with reticulate.

One issue is to ensure that msprime, tskit, pyslim are installed at their right versions. If they are not, slendr could break in strange ways. I don't allow custom Python environments precisely because I need 100% control about what versions of Python modules slendr uses.

Then, if you have a Python virtual environment setup with necessary dependencies, you could try to activate that environment manual using use_virtualenv() from reticulate (see this). After that, you should be able to call library(slendr). You will get a warning about a missing Python setup, but that's just because slendr requires it's own, well-defined, internal Python env to be present. Alternative Python envs should still work.

I say "should" because at that point we're really in uncharted territory and I'm not sure I'd be able to help with slendr-Python related issues in that case.

Still, this is the best work-around I have for now, unless the reticulate folks fix whatever bug is causing the Python (or their Python setup) script to be killed and it also assumes the problem lies with conda. This is why I'm recommending to use normal boring Python virtual environment which personally never failed me (unlike conda).


To recap. You could try:

  1. Create a small Python virtual environment with msprime/tskit/pyslim (and also pandas -- at whatever recent version, that one doesn't matter).
  2. Connect that Python environment to the R session via reticulate::use_virtualenv
  3. Load library(slendr), ignoring the warning about missing Python stuff.
  4. Hope that slendr can still use that manually created Python environment.
bodkan commented 2 years ago

Actually, let's introduce another step between 3. and 4.:

3.5 Run check_env() to see that slendr found all required modules (not sure if this function will work with custom Python environments).

RishiDeKayne commented 2 years ago

So an (unsatisfactory) update - just to be certain I had not done something weird with my python install I uninstalled and reinstalled python on my machine (an M1 mac running MacOS Monterey v12.5.1) - from here on the python install I used is in /usr/bin/python3.

What I realised is that this issue may be specific to the M1 mac because on reinstalling python and trying to newly install msprime, tskit, pyslim, and numpy I ran into a number of pip issues which I think relate to the M1 chipset. e.g. errors starting with ERROR: Could not build wheels for xxxx.

The only way I could find to install these now was to open a new terminal with Rosetta (for others looking to do the same here is a good guide: https://apple.stackexchange.com/questions/409746/run-everything-in-rosetta-2-on-silicon-mac).

Installing msprime, tskit, pyslim, and numpy in the rosetta terminal (e.g. with python3 -m pip install msprime) fixed this 'wheels' error and so I am now able to get the following:

$ /usr/bin/python3
Python 3.8.9 (default, Apr 13 2022, 08:48:07) 
[Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import msprime

To double-check I tested whether I could import msprime in this way in a regular terminal window but I can not and instead get the following error:

ImportError: dlopen(/Users/rishide-kayne/Library/Python/3.8/lib/python/site-packages/msprime/_msprime.cpython-38-darwin.so, 0x0002): tried: '/Users/rishide-kayne/Library/Python/3.8/lib/python/site-packages/msprime/_msprime.cpython-38-darwin.so' (mach-o file, but is an incompatible architecture (have (x86_64), need (arm64e)))

This seems to suggest that msprime will only run (at least on my machine from the Rosetta terminal). Knowing this I still tried to create a virtual environment (with pyenv) within the Rosetta terminal and in a new environment I tried to install msprime etc. there but ran into a compatibility issue with the original ERROR: Could not build wheels for xxxx. Some digging also suggested this could be a python version issue (with older versions not playing well with pyenv but I couldn't get any of the software to install with more recent python versions even within Rosetta).

The same happened with reticulate but some other errors suggested a bigger problem about how reticulate is accessing python. Despite specifying reticulate::use_python("/usr/bin/python3", required = TRUE) part of the error for installing software included "WARNING: You are using pip version 20.2.3; however, version 22.2.2 is available." but this makes no sense to me as when I run pip --version in terminal (both regular and Rosetta) I get pip 22.2.2 from /Users/rishide-kayne/Library/Python/3.8/lib/python/site-packages/pip (python 3.8). So I am totally lost as to why it cannot access the correct software versions. The wheel error suggests to me that the python3 install that both pyenv and reticulate are initialising is running on the M1 x86_64 meaning that the correct versions of software (or at least those which are compatible with one another) including pip, msprime, numpy etc. cannot be downloaded since these would require the arm64e versions through Rosetta.

Any advice you would have would be great but I think I may be at a bit of a loss here it also seems like reticulate and R studio have not always played well together (https://github.com/rstudio/reticulate/issues/1062 so this could be another reason for the reticulate issues but I even tried using older versions of R studio but this did not fix my problem).

hmoots commented 2 years ago

Hello! I have a related issue with an error being thrown upon loading the package:

Error: package or namespace load failed for ‘slendr’: .onAttach failed in attachNamespace() for 'slendr', details: call: setup_env() error: Python environment msprime-1.2.0_tskit-0.5.2_pyslim-1.0 has been found but it does not appear to have msprime, tskit and pyslim modules all installed. Perhaps the environment got corrupted somehow? Runningclear_env()andsetup_env()to reset the slendr's Python environment is recommended. In addition: Warning message: package ‘slendr’ was built under R version 4.1.2

I had installed slendr and was working my way through the tutorials and then when I needed the msprime-1.2.0, tskit-0.5.2, pyslim-1.0 python packages, I ran setup_env(). I didn't exit during installation, but it seems something went wrong. Do you know a way to clear the slendr Python environment without being able to load slendr (I can't run clear_env() because I can't load slendr).

Thanks!!

bodkan commented 2 years ago

Hey @hmoots,

Yes, I've seen this error a couple of times. Still not sure what causes it but it seems that either the Python environment gets broken during the installation or its an issue with how R links to that Python environment. I have noticed this sometimes happens when a user tries to run a Python-backed slendr function before a Python environment is actually created. Almost as if R itself (or the R interface to Python, i.e. not slendr) "remembered" somewhere that a Python environment should already be created... when it isn't. :( Worse still, this happens only sometimes for some users, most other users don't get this issue. :(

Either way, this pulls the rug from under slendr and there is (currently) no way how it can solve this by itself "from the inside". Because the problem does, indeed, occur outside of slendr.

One sure way to solve this that I have found is to obliterate the broken Python environment and re-create it again. For instance, on my machine I would first run this:

> reticulate::conda_list()
                                  name                                                                                   python
1                                 base                                           /Users/mp/Library/r-miniconda-arm64/bin/python
2 msprime-1.2.0_tskit-0.5.2_pyslim-1.0 /Users/mp/Library/r-miniconda-arm64/envs/msprime-1.2.0_tskit-0.5.2_pyslim-1.0/bin/python
3                         r-reticulate                         /Users/mp/Library/r-miniconda-arm64/envs/r-reticulate/bin/python

After identifying the path to the broken Python environment I would then run

rm -rf /Users/mp/Library/r-miniconda-arm64/envs/msprime-1.2.0_tskit-0.5.2_pyslim-1.0/

Then, in a new R session, I would do the standard

library(slendr)
setup_env()

and proceed with installing a fresh new Python environment.

This helped every single time so far.

hmoots commented 2 years ago

Hello @bodkan - thanks so much! I had indeed tried to run a Python-backed slendr function before creating the Python environment. I have tried to follow the code you shared, but I am getting this error:

Error: unexpected numeric constant in: "rm -rf /Users/hannahmoots/Library/r-miniconda/envs/msprime-1.2.0"

I think it's an issue with the underscore, because the full path is: rm -rf /Users/hannahmoots/Library/r-miniconda/envs/msprime-1.2.0_tskit-0.5.2_pyslim-1.0

Do you know a way to handle this? I'm using the latest R update (R 4.2.1)

Thanks so much for your help! Hannah

hmoots commented 2 years ago

Update: I just ran the ccommand in my terminal and it worked (don't know what I didn't think of that before!). And the new Python environment with setup_env() seems to have installed okay.

Many thanks - looking forward to using slendr! Hannah