Closed SAMtoBAM closed 2 years ago
Good evening,
If I understand correctly, you installed mamba following these steps (or equivalent):
conda activate persvade
conda install mamba
As you say, I think that this is the source of the problem. The script installation/setup_environment.sh
tries to activate a persvade_RepeatMasker_env that has been installed from the base environment. I think that, because you installed mamba inside the persvade_env, the persvade_RepeatMasker_env was also installed inside the persvade_env, so that the script can't find a 'base' persvade_RepeatMasker_env. You may install mamba from the base conda environment to solve this. This is why I recommend in the wiki to first install mamba, then the persvade environment. I will clarify that this order is important.
When you say If I run conda activate with the complete prefix/path it works but not with just the specific env name
does it mean that you do conda activate <conda_dir>/persvade_env
and then installation/setup_environment.sh
? Or you modified the script to work on your nested conda-mamba structure?
In any case, I understand that you already solved this problem right? If not, is it possible for you to install first install mamba, then the persvade environment as recommended?
I hope that this helps,
Miquel Àngel Schikora Barcelona Supercomputing Center
PS: This kinds of dependency issues are the reason by which we generally recommend to use the singularity / docker versions. If you found any trouble using the docker or singularity images, I'd be happy to clarify them.
I'm installing mamba now and can re-try the classic installation process. The reason I am doing this is because I was unable to follow your steps to get singularity to work. I installed singularity as suggested and then the build appeared to work but using the .sif file always provoked errors.
for example running your example of how to use it with just 'ls /perSVade' gave me this error:
singularity exec -e mikischikora_persvade_v1.02.4.sif ls /perSVade
ERROR : Could not create /dev/loop44: Permission denied
ABORT : Retval = 255
So apparently root permissions are required in order to run the docker image but I don't know how to get around this.
Hi again, so installed mamba at the base level then installation etc ran fine However I apparently have an issue when testing python modules
./installation/test_installation/test_installation_modules.py
---
ERROR: loading the modules of sv_functions did not work properly. This may be because your environment is not properly set. When running perSVade, make sure that the python interpreter is the expected with 'which python'.
For BSC users using the mschikora installation:
- Verify that the python interpreter is /gpfs/projects/bsc40/mschikora/anaconda3/envs/perSVade_<version>_env/bin/python to check that the environment is correctly activated.
- You may find trouble executing perSVde with the source /gpfs/projects/bsc40/mschikora/anaconda3/etc/profile.d/conda.sh if you have other conda environments installed in the cluster (for example the mn0 conda). You can manually fix this by changing the PATH variable with export PATH=$PATH:/gpfs/projects/bsc40/mschikora/anaconda3/envs/perSVade_<version>_env/bin.
---
---
This is the error thrown when trying to import python modules:
File "./installation/test_installation/../../scripts/sv_functions.py", line 62, in <module>
import plotly.plotly as py
File "/home/samuel/.local/lib/python3.6/site-packages/plotly/plotly/__init__.py", line 4, in <module>
_chart_studio_error("plotly")
File "/home/samuel/.local/lib/python3.6/site-packages/_plotly_future_/__init__.py", line 49, in _chart_studio_error
submodule=submodule
The plotly.plotly module is deprecated,
please install the chart-studio package and use the
chart_studio.plotly module instead.
---
when I run 'which python' I get this:
/home/samuel/anaconda3/envs/persvade2/bin/python
So python appears to be in the right place but there is this issue with plotly? thanks
Hi,
Regarding the singluarity installation, I think that it should work without root permissions, and I have tested it so. There may be some interaction between singularity and your system that gives problems. Which system are you using? I am thinking about a way to reproduce your error and fix this.
Regarding the perSVade running problem, it seems that, for some reason, perSVade is trying to import python packages from '/home/samuel/.local/lib/python3.6/site-packages/', and it should be importing them from somewhere in '/home/samuel/anaconda3/envs/persvade2'. This is what most likely causes this error.
This means that, for some reason, perSVade's environment is not properly activated. Maybe this can be set by manually changing the PATH variable with export PATH=$PATH:/home/samuel/anaconda3/envs/persvade2/bin
before execution. If this doesn't work, can you think of a reason why perSVade is not importing packages from the conda environment?
I hope this helps,
Miquel Àngel Schikora
Hi Miquel,
I updated conda, then updated anaconda, then updated mamba, then reinstalled perSVade with mamba, followed by its dependencies, but came into the same issue of the python package path. I tried exporting the path as suggested but it didn't help. So not sure what to do there.
On a good note, I also updated singularity (after all the other updates) and then rebuilt the .sif image and now it appears to be working without the permission error. Not sure what the issue was but solved now. Unfortunately, when trying to run the installation test I come into the same issue as above, where the python path is not correct...
singularity exec -B ./perSVade_testing_outputs:/perSVade/installation/test_installation/testing_outputs -e mikischikora_persvade_v1.02.4.sif bash -c 'source /opt/conda/etc/profile.d/conda.sh && conda activate perSVade_env && /perSVade/installation/test_installation/test_installation_modules.py'
INFO: Converting SIF file to temporary sandbox...
---
ERROR: loading the modules of sv_functions did not work properly. This may be because your environment is not properly set. When running perSVade, make sure that the python interpreter is the expected with 'which python'.
For BSC users using the mschikora installation:
- Verify that the python interpreter is /gpfs/projects/bsc40/mschikora/anaconda3/envs/perSVade_<version>_env/bin/python to check that the environment is correctly activated.
- You may find trouble executing perSVde with the source /gpfs/projects/bsc40/mschikora/anaconda3/etc/profile.d/conda.sh if you have other conda environments installed in the cluster (for example the mn0 conda). You can manually fix this by changing the PATH variable with export PATH=$PATH:/gpfs/projects/bsc40/mschikora/anaconda3/envs/perSVade_<version>_env/bin.
---
---
This is the error thrown when trying to import python modules:
File "/perSVade/installation/test_installation/../../scripts/sv_functions.py", line 62, in <module>
import plotly.plotly as py
File "/home/samuel/.local/lib/python3.6/site-packages/plotly/plotly/__init__.py", line 4, in <module>
_chart_studio_error("plotly")
File "/home/samuel/.local/lib/python3.6/site-packages/_plotly_future_/__init__.py", line 49, in _chart_studio_error
submodule=submodule
The plotly.plotly module is deprecated,
please install the chart-studio package and use the
chart_studio.plotly module instead.
---
INFO: Cleaning up image...
Any ideas??
Hi,
First of all, I am happy that the singularity build worked now. Which version of singularity is working for you? I ask because further users may benefit from knowing this.
Regarding the plotly error, it seems that your local python is interacting strangely with perSVade's conda environment regardless of you using singularity or conda installation right? It seems to me that your perSVade's python is not looking for the packages where it should. The reason for this is not clear to me since I don't have access to your machine, but I'll try to propose some solutions.
Can you send the content of the sys.path
python object? You can do this by typing the following commands:
conda activate persvade2
python
>>> import sys
>>> print(sys.path)
The sys.path
object contains the directories in which your persvade2's python tries to import packages. In my case, my sys.path contains:
['<conda_dir>/envs/perSVade_env/lib/python36.zip',
'<conda_dir>/envs/perSVade_env/lib/python3.6',
'<conda_dir>/envs/perSVade_env/lib/python3.6/lib-dynload',
'<conda_dir>/envs/perSVade_env/lib/python3.6/site-packages']
Note that my perSVade environment is called perSVade_env. You should have equivalent paths at the beginning of your persvade2's sys.path. My first hypothesis is that, for some reason, your sys.path contains the /home/samuel/.local/lib/python3.6/site-packages/
before the directories under /home/samuel/anaconda3/envs/persvade2
. If that is the case, you can add any desired directory to sys.path (before running python) through the PYTHONPATH environmental variable, for example with:
conda activate persvade2
export PYTHONPATH=<any_path>
>>> import sys
>>> print(sys.path)
You should see that <any_path>
has been added at the beginning of sys.path. Can you try adding the directories under /home/samuel/anaconda3/envs/persvade2
in this way? Maybe this solves the environmental issue.
This types of weird interactions between conda and non-conda python installations are fairly common in my experience. Did you ever run into similar problems with other conda-based software?
I hope this helps,
Miquel Àngel Schikora
I have not experienced these same issues with other conda installations. So it is all new to me this.
here is the results from the sys.path inside my persvade env:
['', '/home/samuel/anaconda3/envs/persvade/lib/python36.zip', '/home/samuel/anaconda3/envs/persvade/lib/python3.6', '/home/samuel/anaconda3/envs/persvade/lib/python3.6/lib-dynload', '/home/samuel/.local/lib/python3.6/site-packages', '/home/samuel/anaconda3/envs/persvade/lib/python3.6/site-packages', '/home/samuel/anaconda3/envs/persvade/lib/python3.6/site-packages/parallel_fastq_dump-0.6.3-py3.6.egg']
So the path to my local python installation is certainly there, but only for the path to site.packages so I ran (env is now just called persvade after some cleaning):
export PYTHONPATH=/home/samuel/anaconda3/envs/persvade/lib/python3.6/site-packages
This appears to fix the problem with the python package importation, however now running the installation test there is another issue:
./installation/test_installation/test_installation_modules.py
Matplotlib is building the font cache; this may take a moment.
[26/03/2022, 08:24:13] loading python packages worked successfully
[26/03/2022, 08:24:13] all output files will be written to ./installation/test_installation/testing_outputs
[26/03/2022, 08:24:13] setting all the files
ERROR: This cross-compiler package contains no program /home/samuel/anaconda3/envs/persvade_picard_env/bin/x86_64-conda_cos6-linux-gnu-gfortran
INFO: activate-gfortran_linux-64.sh made the following environmental changes:
+HOST=x86_64-conda_cos6-linux-gnu
-HOST=x86_64-conda-linux-gnu
Traceback (most recent call last):
File "/home/samuel/Documents/perSVade/perSVade-1.02.4/installation/test_installation/../../scripts/align_reads", line 113, in <module>
real_available_threads = fun.get_available_threads(opt.outdir)
File "/home/samuel/Documents/perSVade/perSVade-1.02.4/installation/test_installation/../../scripts/sv_functions.py", line 1110, in get_available_threads
sorted_bam = get_sorted_bam_test(reads1, reads2, genome, replace=False)
File "/home/samuel/Documents/perSVade/perSVade-1.02.4/installation/test_installation/../../scripts/sv_functions.py", line 1062, in get_sorted_bam_test
run_bwa_mem(r1, r2, ref_genome, outdir, bamfile, sorted_bam, index_bam, name_sample, threads=4, replace=False, MarkDuplicates=False)
File "/home/samuel/Documents/perSVade/perSVade-1.02.4/installation/test_installation/../../scripts/sv_functions.py", line 1368, in run_bwa_mem
check_sorted_bam_has_correct_insert_sizes(sorted_bam_tmp, replace, threads)
File "/home/samuel/Documents/perSVade/perSVade-1.02.4/installation/test_installation/../../scripts/sv_functions.py", line 1262, in check_sorted_bam_has_correct_insert_sizes
median_insert_size, median_insert_size_sd = get_insert_size_distribution(sorted_bam, replace=replace, threads=threads)
File "/home/samuel/Documents/perSVade/perSVade-1.02.4/installation/test_installation/../../scripts/sv_functions.py", line 7681, in get_insert_size_distribution
run_cmd("%s CollectInsertSizeMetrics HISTOGRAM_FILE=%s INPUT=%s OUTPUT=%s > %s 2>&1"%(picard_exec, hist_file, sampled_bam, outfile_tmp, picard_insertSize_std), env=EnvName_picard)
File "/home/samuel/Documents/perSVade/perSVade-1.02.4/installation/test_installation/../../scripts/sv_functions.py", line 691, in run_cmd
if out_stat!=0: raise ValueError("\n%s\n did not finish correctly. Out status: %i"%(cmd_to_run, out_stat))
ValueError:
source /home/samuel/anaconda3/etc/profile.d/conda.sh && conda activate persvade_picard_env && /home/samuel/anaconda3/envs/persvade_picard_env/bin/picard CollectInsertSizeMetrics HISTOGRAM_FILE=./installation/test_installation/testing_outputs/align_reads_sim_SVs/getting_available_threads/genome.fasta_simulating_reads/getting_reads/aligning_reads_against_genome.fasta/aligned_reads.bam.sorted.tmp.checking_correct_insertSizes/aligned_reads.bam.sorted.histogram_insertsizes.pdf INPUT=./installation/test_installation/testing_outputs/align_reads_sim_SVs/getting_available_threads/genome.fasta_simulating_reads/getting_reads/aligning_reads_against_genome.fasta/aligned_reads.bam.sorted.tmp.checking_correct_insertSizes/aligned_reads.bam.sorted.1pct_reads_seed280_sampleX.bam OUTPUT=./installation/test_installation/testing_outputs/align_reads_sim_SVs/getting_available_threads/genome.fasta_simulating_reads/getting_reads/aligning_reads_against_genome.fasta/aligned_reads.bam.sorted.tmp.checking_correct_insertSizes/aligned_reads.bam.sorted.CollectInsertSizeMetrics.out.tmp > ./installation/test_installation/testing_outputs/align_reads_sim_SVs/getting_available_threads/genome.fasta_simulating_reads/getting_reads/aligning_reads_against_genome.fasta/aligned_reads.bam.sorted.tmp.checking_correct_insertSizes/aligned_reads.bam.sorted.CollectInsertSizeMetrics.out.tmp.generating.std 2>&1
did not finish correctly. Out status: 256
ERROR!!! Out status: 256
Traceback (most recent call last):
File "./installation/test_installation/test_installation_modules.py", line 112, in <module>
fun.run_cmd("%s align_reads --threads %i --fraction_available_mem 1.0 -f1 %s -f2 %s -o %s --ref %s --min_chromosome_len 100"%(fun.perSVade_modules, threads, sim_reads1, sim_reads2, outdir_align_reads_SVs, ref_genome))
File "./installation/test_installation/../../scripts/sv_functions.py", line 691, in run_cmd
if out_stat!=0: raise ValueError("\n%s\n did not finish correctly. Out status: %i"%(cmd_to_run, out_stat))
ValueError:
source /home/samuel/anaconda3/etc/profile.d/conda.sh && conda activate persvade && /home/samuel/Documents/perSVade/perSVade-1.02.4/installation/test_installation/../../scripts/perSVade align_reads --threads 40 --fraction_available_mem 1.0 -f1 ./installation/test_installation/testing_inputs/all_reads1.correct.fq.gz -f2 ./installation/test_installation/testing_inputs/all_reads2.correct.fq.gz -o ./installation/test_installation/testing_outputs/align_reads_sim_SVs --ref ./installation/test_installation/testing_outputs/reduced_genome.fasta --min_chromosome_len 100
did not finish correctly. Out status: 2560
So not sure of what the issue is here exactly.
The singularity version that is working is 3.8.6
As above, I can export a similar python package for the singularity perSVade environment to solve the python package path issue:
singularity exec -B ./perSVade_testing_outputs:/perSVade/installation/test_installation/testing_outputs -e mikischikora_persvade_v1.02.4.sif bash -c 'source /opt/conda/etc/profile.d/conda.sh && conda activate perSVade_env && export PYTHONPATH=/opt/conda/envs/perSVade_env/lib/python3.6/site-packages && /perSVade/installation/test_installation/test_installation_modules.py'
The installation test module finishes, which is great!
Just to note, there is one error that appears consistently throughout the test taking up most of the verbatim output (but doesn't halt the process):
ERROR: This cross-compiler package contains no program /opt/conda/envs/perSVade_env_picard_env/bin/x86_64-conda_cos6-linux-gnu-gfortran INFO: activate-gfortran_linux-64.sh made the following environmental changes: +HOST=x86_64-conda_cos6-linux-gnu -HOST=x86_64-conda-linux-gnu
I will try running now perSVade with the singularity container. Looking forward to see the results
Hi,
It seems that we have three topics here:
Alright, I guess that this /home/samuel/.local/lib/python3.6/site-packages
set before /home/samuel/anaconda3/envs/persvade/lib/python3.6/site-packages
in sys.path caused the error, most likely because the test_installation_modules.py
script is trying to import packages (i.e. plotly) from /home/samuel/.local/lib/python3.6/site-packages
, while it should be importing them from /home/samuel/anaconda3/envs/persvade/lib/python3.6/site-packages
.
The thing is that perSVade has several steps of conda activate
, since different parts of the pipeline require different environments. I am thinking that maybe in some of these activation steps (for example this conda activate persvade_picard_env
that is giving an error) you are changing the sys.path, having againg the incompatibility with packages of /home/samuel/.local/lib/python3.6/site-packages
. This is a known conda bug, and maybe the solution of setting only once the PYTHONPATH is not enough. Another solution that I have seen in this link is removing the ~/.local/*
path from the sys.path by export PYTHONNOUSERSITE=True
before running perSVade.
This is all hypothetical, since I still don't know what the problem is. In any case, the file /installation/test_installation/testing_outputs/align_reads_sim_SVs/getting_available_threads/genome.fasta_simulating_reads/getting_reads/aligning_reads_against_genome.fasta/aligned_reads.bam.sorted.tmp.checking_correct_insertSizes/aligned_reads.bam.sorted.CollectInsertSizeMetrics.out.tmp.generating.std
contains the STDERR and STDOUT of the line that failed. Can you check it, or copy-paste it's output here? If there is a conda environmental problem I guess that it should be easy to see.
Regarding the singulariy execution, I am happy that it worked. It makes sense that the containerization works better there than in the typical conda installation. I will create a FAQ explaining this possible issue, and how to solve it. I hope that it works on your real data too. I guess that you can use the singularity option, since it works, and it should be the most reproducible as well.
This error is more like a warning, it is telling you that there are some environmental changes made to make perSVade work on your machine. I have seen this in some of the testing machines. I will create a FAQ clarifying that this is reasonable behavior of the pipeline.
I hope that this helps,
Miquel Àngel Schikora
You are right, this issue thread has been gradually expanding upon multiple issues.
You are right, I ran:
export PYTHONNOUSERSITE=True
to remove the local path and this also solved the python package issue as opposed to adding the site packages path again.
However regardless I get the same error as before and this appears to be the issue based on the STDOUT as you suggested:
ERROR 2022-03-29 11:25:08 ProcessExecutor /home/samuel/anaconda3/envs/persvade_picard_env/lib/R/bin/exec/R: error while loading shared libraries: libreadline.so.6: cannot open shared object file: No such file or directory
Does this help?
I am running now perSVade on some of my own data using the singularity container and it appears to be working well. I just have a one comment for now after running through most of the modules: After the 'align_reads' command, it seems the output was aligned_reads.bam.sorted instead of aligned_reads.sorted.bam
Thanks for adding the FAQ section. My biggest issue with it, but it's really just being picky, is just that the message repeats itself and drowns all the terminal output.
Hi again,
I'll answer point by point:
I have seen that libreadline.so.6
library can give problems with conda-based R installation in some Ubuntu versions. For example I have foud this post, where they propose to install manually the library as a solution.
I understand that, since the singularity solution already works for you, maybe you don't want to dig deeper in this topic. If you want to solve this maybe you can do try the solution of the link or directly install the readline library with any of the following commands:
conda install -n persvade_picard_env -c anaconda readline
conda install -n persvade_picard_env -c anaconda readline=6.0
This is quite surprising to me, as I would have expected conda to handle these kinds of problems. This is why I recommend using docker or singularity, because conda installations are not reproducible enough for a pipeline like this.
Regarding the aligned_reads.bam.sorted
, I agree that this can be confusing. The next release will have this changed.
I agree that this can be annoying, it's just that I am not sure where this is appearing because it olny happened to me once in a specific machine. I will try to reproduce this message and have it fixed for the next release.
I hope that this helps,
Miquel Àngel Schikora
Good morning,
I assume that, as you are already using the singularity image, this issue is no longer relevant.
I hope that this helps,
Miquel Àngel Schikora
Hi again,
I am running the dependency installation and after mamba installs RepeatMasker, it tries to activate the environment but fails to do so as below:
If I run conda activate with the complete prefix/path it works but not with just the specific env name (due to being inside the persvade environment already it seems).
Just to note, RepeatMasker is being installed with mamba where-as persvade was installed with conda, and then I installed mamba within the persvade environment. In case this is an issue.
Thanks