AlexsLemonade / scpca-downstream-analyses

This repository is intended to store our pipeline for marker genes analysis.
0 stars 0 forks source link

Use conda (mostly) to handle R dependencies #316

Closed jashapiro closed 1 year ago

jashapiro commented 1 year ago

This PR replaces #306 in order to target main as requested. (Rebasing was a bit more complex than just making a new branch & porting over changes.)

I added a bit to the setup script to update the renv.lock file on running. This seemed like the best way to automate that process, but let me know if you think it should instead be somewhere else.

One thing I realized in testing (which was not the source of all of our problems, but maybe some) is that the version of scpcaTools used here is old enough to depend on Matrix.utils which is no longer available (though we can get it with conda). I don't think there is any reason not to move to scpcaTools 0.2.1, but let me know if that is likely to break something. After I file this I am going to start a test to update to that version to see if that works.

jashapiro commented 1 year ago

A few more additions in the latest updates:

This last one means that the workflow no longer works if the required packages are not installed in the default library; conda is basically required to run the full workflow, but you can use an renv library for developing/debugging individual scripts if you like.

We could add some kind of fallback to check if renv can be activated if we are not using conda... I'm not exactly sure what this would look like, but my thought is that we could check if .libPaths() contains path(s) with .snakemake/conda, and if it does not, then we can try to load an renv. I'm not sure if that needs to be in this PR, but let me know what you think.

jashapiro commented 1 year ago

Thanks again for doing this @jashapiro! I tried testing ./setup_envs.sh clean locally but encountered the following error:

snakemake: error: unrecognized arguments: snapshot_renv

I also tried running the setup_envs.sh script line by line and got the error once again at line 24 (snakemake --use-conda -c1 --quiet rules -f snapshot_renv).

Any idea what may be going on here?

What version of snakemake do you have installed?

For reference, I have:

❯ snakemake --version
7.20.0
cbethell commented 1 year ago

What version of snakemake do you have installed?

Ah, I have version 7.3.8 -- I can update my snakemake version and try testing again!

**Edit: Updating the snakemake version worked for me so we will definitely want to update the docs to note that the minimum version of snakemake would be 7.20.0 (fwiw I updated to 7.24.0 and things work fine).

jashapiro commented 1 year ago

I also got an error at the snapshot step... But I thought we were using the version of scpcaTools that doesn't require matrix.utils? When I activate the conda environment that's getting built and then go into it, the version of scpcaTools that's still getting used is 0.1.8... Screen Shot 2023-03-07 at 3 13 37 PM

I think this may be temporary... I'm working on some further updates... But it may be a "needs cleanup" situation as well...

jashapiro commented 1 year ago

Okay, some more updates:

All of this means that the workflow should now work with or without conda, but it requires that renv be initialized before running without conda. It should work to set this up by running snakemake -c1 -r setup_renv, but I actually haven't fully tested that yet.

Some other thoughts/caveats:

allyhawkins commented 1 year ago

Okay I finally got this to work, but I wanted to note that to get it to work I had to remove my previous installation of scpcaTools which was an older version. For some reason it was still using the version that I had and until I removed that package completely before running I was still getting the same dependency failure on Matrix.utils.

jashapiro commented 1 year ago

What yo you mean by "to get it to work I had to remove my previous installation of scpcaTools"? Where was that installed? I'm wondering if there might have been something else going on in terms of which R binary was being used.

allyhawkins commented 1 year ago

It was installed in my home R library. So in R, I removed the package and then was able to run the setup script successfully.

jashapiro commented 1 year ago

It was installed in my home R library. So in R, I removed the package and then was able to run the setup script successfully.

That is a bit concerning, as it should not have been using your home R library at all... when you activate the conda environment, what do you have which R and for your PATH? Also, if you open R, what do you see for .libPaths()?

allyhawkins commented 1 year ago

which R shows that it's using R in the conda environment: /home/scpca_tester/scpca-downstream-analyses/.snakemake/conda/66672e8fe9bc92a09a3eab8ec39eb573_/bin/R

path: /home/scpca_tester/scpca-downstream-analyses/.snakemake/conda/66672e8fe9bc92a09a3eab8ec39eb573_/bin:/opt/bin:/opt/R/4.1.2/bin:/opt/python/3.7.3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin

This might be the problem? It looks like it's grabbing my base R?

And then .libPaths() gives:

[1] "/home/scpca_tester/R/x86_64-pc-linux-gnu-library/4.1"                                                         
[2] "/home/scpca_tester/scpca-downstream-analyses/.snakemake/conda/66672e8fe9bc92a09a3eab8ec39eb573_/lib/R/library"
jashapiro commented 1 year ago

It looks like it is using the correct R, as that appears first in the path, but it is somehow getting a different R when setting the library, and we need to override that. Most likely it is R_LIBS_USER that has gotten set somehow. I think RStudio may do something hidden on that front, so I'm trying to track that down.

jashapiro commented 1 year ago

I'm marking this ready for review. I'm fairly happy with the current implementation in my testing, including now unsetting the R_USER_LIB environment variable in the YAML file. I considered using the conda-ecosystem-user-package-isolation package, but this is not implemented for arm64 for some reason (and hasn't been updated in years), so I thought going directly was potentially better.

jashapiro commented 1 year ago

So this worked for me locally, but then trying it on the server I was still getting issues but this is a new error that I haven't seen before 😢

I've tried a few different times, making sure to remove the previous conda environments and also updated my snakemake version to be 7.24 and it's still giving the same error.

What was the command you were using when the error occurred? is there a renv.lock file that was created? Was it created somewhere I didn't expect?

allyhawkins commented 1 year ago

What was the command you were using when the error occurred?

bash setup_envs.sh

is there a renv.lock file that was created? Was it created somewhere I didn't expect?

No lock file was created... or at least not in the root directory.

jashapiro commented 1 year ago

What was the command you were using when the error occurred?

bash setup_envs.sh

is there a renv.lock file that was created? Was it created somewhere I didn't expect?

No lock file was created... or at least not in the root directory.

Can you send me the contents of logs/snapshot_renv.log?

allyhawkins commented 1 year ago

Hmm it looks like it's not updating the lock file? This is all that's in the log file.

WARNING: ignoring environment value of R_HOME
* The lockfile is already up to date.
jashapiro commented 1 year ago

Hmm it looks like it's not updating the lock file? This is all that's in the log file.

WARNING: ignoring environment value of R_HOME
* The lockfile is already up to date.

So where is it finding a lock file? I'm definitely confused.