Closed osowiecki closed 3 years ago
Removed all version numbers from install_*.sh files and set conda version to latest. Everything installed. Will notify you if something crashes during the analysis.
Facing the same issue. And I did as you suggested and removed all the version numbers from the install*.sh files.
I was wondering what the developers suggest regarding changing the dependency versions. Would this result in any changes to the results?
Here are my current package lists.
Thanks @osowiecki . Most of the version are same for me. Few exceptions are python packages that have slightly different version numbers.
Were you able to run this tool on your data successfully ?
Still running (50% of Assembly step after 24h currently). I had to run it on stronger machine as
"canu iteration count too high, stopping pipeline (most likely a problem in the grid-based computes)" Canu required more resources to run. 40 threads and 189GB of RAM wasn't enough, but on a stronger machine it ran.
"samtools merge: fail to open "assemble/group/gr-4-24000000-1000000/contig.bam": Too many open files" Increase your opened file limit on Linux and it will run then.
Arrow polisher crashed on my RSII Pacbio data due to incompatible chemistry. I swiched to Quiver and it runs.
samtools view -h assemble/local_assemblies.bam | python3 -s /media/raid/smrtsv2/scripts/call/TilingPath.py /dev/stdin > call/tiling_contigs.tab
Traceback (most recent call last):
File "/media/raid/smrtsv2/scripts/call/TilingPath.py", line 82, in
EDIT :
intervaltree=2.1.0
Is required in this older version.
EDIT2: scikit-learn=0.20.2 the lastest version in conda is incompatible,
By changing the miniconda version I was able to resolve the dependency issue. See here: https://github.com/EichlerLab/smrtsv2/issues/49#issuecomment-615439675
By changing the miniconda version I was able to resolve the dependency issue. See here: #49 (comment)
New Pandas also fail. I'll try your solution.
RuleException:
KeyError in line 242 of /media/raid/smrtsv2/rules/genotype.snakefile:
'0'
File "/media/raid/smrtsv2/rules/genotype.snakefile", line 242, in __rule_gt_vcf_get_sample_column
File "/media/raid/smrtsv2/smrtsvlib/genotype.py", line 65, in get_sample_column
File "/media/raid/smrtsv2/dep/conda/build/envs/python3/lib/python3.6/site-packages/pandas/core/series.py", line 3848, in apply
File "pandas/_libs/lib.pyx", line 2329, in pandas._libs.lib.map_infer
File "/media/raid/smrtsv2/smrtsvlib/genotype.py", line 65, in
It's looking for a SEX column in the sample manifest file. Make sure the file is setup correctly (double-check GENOTYPE.md
section "Sample table").
Thanks for helping to resolve the dependency issues. That accounts for too many issues on Github, so I am probably going to remove it and hope for the best.
Also, SMRT-SV has not been updated to use pbgcpp
instead of Arrow, and that will have the latest chemistries. Syntax should be similar though.
What do you think about this? Warning to ignore or serious error?
[Tue Apr 21 13:34:40 2020] rule gt_call_sample_insert_delta: input: altref/ref.fasta, sv_calls/sv_calls.bed, samples/s79757-500bp/alignments.cram output: samples/s79757-500bp/temp/insert_delta.tab, samples/s79757-500bp/insert_size_stats.tab jobid: 24 wildcards: sample=s79757-500bp
/media/raid/smrtsv2/scripts/genotype/GetInsertSizeDelta.py:204: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy sv_rec['N_INSERT'] = len(insert_array) /media/raid/smrtsv2/dep/conda/build/envs/python3/lib/python3.6/site-packages/pandas/core/series.py:915: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy self.loc[key] = value /media/raid/smrtsv2/scripts/genotype/GetInsertSizeDelta.py:214: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy sv_rec['INSERT_LOWER'] = 0 /media/raid/smrtsv2/scripts/genotype/GetInsertSizeDelta.py:215: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy sv_rec['INSERT_UPPER'] = 0 /media/raid/smrtsv2/scripts/genotype/GetInsertSizeDelta.py:210: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy sv_rec['INSERT_LOWER'] = sum(insert_array < -z_limit) / sv_rec['N_INSERT'] /media/raid/smrtsv2/scripts/genotype/GetInsertSizeDelta.py:211: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy sv_rec['INSERT_UPPER'] = sum(insert_array > z_limit) / sv_rec['N_INSERT']
It's looking for a SEX column in the sample manifest file. Make sure the file is setup correctly (double-check
GENOTYPE.md
section "Sample table").
SAMPLE SEX DATA s79757-500bp U /media/raid/bam/s79757-500bp.bam s79757-400bp U /media/raid/bam/s79757-400bp.bam s79757-11kb U /media/raid/bam/s79757-11kb.bam s79757-8kb U /media/raid/bam/s79757-8kb.bam
These are my original illumina bam files. Don't know what else can I change in the tab file.
The script fails exactly here in genotype.py :
# Set genotype (GT), genotype quality (GQ), and genotype likelihood (GL)
df_gt['CLASS'] = df_gt.apply(
lambda row: str(np.argmax(row[['HOM_REF', 'HET', 'HOM_ALT']])) if row['CALLABLE'] else 'NO_CALL',
axis=1
)
df_gt is fine before that step. There is something wrong with package versions I assume. Investigating.
[EDIT]
With this set of packages, it genotyped my samples without problems. How about that?
conda install -y \ numpy=1.15.4 \ scipy=1.1.0 \ pandas=0.23.4 \ pysam=0.15.1 \ biopython=1.72 \ intervaltree=2.1.0 \ networkx=2.2 \ pybedtools=0.8.0
conda install -y \ numpy==1.18.1 \ scipy=1.4.1\ pandas=0.20.3 \ pysam \ snakemake \ biopython \ ipython \ drmaa \ scikit-learn=0.19.0 \ intervaltree==2.1.0
Has any one used this on a human genome sample. I am running the assemble step on human PacBio data and it is taking really long. Is there a way to distribute the jobs in the assemble step.
Currently running smrtsv2-2.0.2/smrtsv assemble --asm-cpu 68 --asm-mem 10
.
In the end, would like to run this tool on large number of samples. Any ideas to speed up the processing will be great!
Has any one used this on a human genome sample. I am running the assemble step on human PacBio data and it is taking really long. Is there a way to distribute the jobs in the assemble step.
Currently running
smrtsv2-2.0.2/smrtsv assemble --asm-cpu 68 --asm-mem 10
.In the end, would like to run this tool on large number of samples. Any ideas to speed up the processing will be great!
Use asm-parallel .
ulimit -n 4096 ./smrtsv --tempdir /media/raid/SMRT/temp assemble --asm-cpu 15 --asm-parallel 8 --asm-polish quiver
I am using version 2.0.2 and get this error:
smrtsv.py: error: unrecognized arguments: --asm-parallel
which version are you using?
I am using version 2.0.2 and get this error:
smrtsv.py: error: unrecognized arguments: --asm-parallel
which version are you using?
I'm using the latest one as of today.
./smrtsv assemble -h
This works only in the "assemble" step. Start this particular step manually and then use "call" and "genotype"
I am using the latest too. May be you cloned the git repository?
This is what I see:
smrtsv2-2.0.2/smrtsv assemble -h
usage: smrtsv.py assemble [-h]
[--asm-alignment-parameters ASM_ALIGNMENT_PARAMETERS]
[--mapping-quality MAPPING_QUALITY]
[--asm-cpu ASM_CPU] [--asm-mem ASM_MEM]
[--asm-polish ASM_POLISH]
[--asm-group-rt ASM_GROUP_RT] [--asm-rt ASM_RT]
optional arguments:
-h, --help show this help message and exit
--asm-alignment-parameters ASM_ALIGNMENT_PARAMETERS
BLASR parameters to use to align local assemblies.
--mapping-quality MAPPING_QUALITY
Minimum mapping quality of raw reads. Used by "detect"
to filter reads while finding gaps and hardstops. Used
by "assemble" to filter reads with low mapping quality
before the assembly step.
--asm-cpu ASM_CPU Number of CPUs to use for assembly steps.
--asm-mem ASM_MEM Multiply this amount of memory by the number of cores
for the amount of memory allocated to assembly steps.
--asm-polish ASM_POLISH
Assembly polishing method (arrow|quiver). "arrow"
should work on all PacBio data, but "quiver" will only
work on RS II input.
--asm-group-rt ASM_GROUP_RT
Set maximum runtime for an assembly group. Assemblies
are grouped by region, and multiple assemblies are
done in one grouped job. This is the maximum runtime
for the whole group.
--asm-rt ASM_RT Set maximum runtime for an assembly region. This
should be a valid argument for the Linux "timeout"
command.
I'm using the cloned repository. git clone https://github.com/EichlerLab/smrtsv2
optional arguments:
-h, --help show this help message and exit
--asm-alignment-parameters ASM_ALIGNMENT_PARAMETERS
BLASR parameters to use to align local assemblies.
--mapping-quality MAPPING_QUALITY
Minimum mapping quality of raw reads. Used by "detect"
to filter reads while finding gaps and hardstops. Used
by "assemble" to filter reads with low mapping quality
before the assembly step.
--asm-cpu ASM_CPU Number of CPUs to use for assembly steps.
--asm-mem ASM_MEM Multiply this amount of memory by the number of cores
for the amount of memory allocated to assembly
steps.If multiple simultaneous assemblies are run,
then this is multiplied again by that factor (see
--asm-parallel).
--asm-polish ASM_POLISH
Assembly polishing method (arrow|quiver). "arrow"
should work on all PacBio data, but "quiver" will only
work on RS II input.
--asm-group-rt ASM_GROUP_RT
Set maximum runtime for an assembly group. Assemblies
are grouped by region, and multiple assemblies are
done in one grouped job. This is the maximum runtime
for the whole group.
--asm-rt ASM_RT Set maximum runtime for an assembly region. This
should be a valid argument for the Linux "timeout"
command.
--asm-parallel ASM_PARALLEL
Number of simultaneous assemblies to run. The actual
thread count will be this times --asm-cpu
Thanks! I have got this working.
What is your experience in restarting the steps. Do you know if the assemble
step restarts well if the job gets killed in the middle?
SMRT-SV batches assemblies into megabase-sized regions. Incomplete batches will be re-run if the pipeline is restarted.
The whole pipeline can be distributed with DRMAA. We used it on an SGE cluster.
SMRTSV_DIR=/path/to/SMRT-SV
FOFN_FILE=/path/to/pacbio-bam/sample.fofn
REF_FA=/path/to/hg38.no_alt.fa
${SMRTSV_DIR}/smrtsv.py --cluster-config ${SMRTSV_DIR}/cluster.eichler.json --drmaalib /path/to/libdrmaa.so.1.0 --distribute run --batches 20 --runjobs "25,20,200,10" --threads 8 ${REF_FA} ${FOFN_FILE}
You'll have to adjust cluster.eichler.json for your cluster. For ours, it multiplies the memory by number of cores (e.g. 4 cores and 2gb is 8 gb total). You'll also have to adjust cluster parameters (--cluster_params), which the values from cluster.eichler.json are dropped into for each rule. If those two things together make parameter strings that your cluster accepts, then it should work.
Yes, we have run many human samples through it, on most samples, it takes more than a week with 1,000 cores.
SMRT-SV is a useful tool, but I wouldn't run it without doing PBSV first (Sniffles second). Because it relies on squashed assemblies, it is going to miss about 40% of your heterozygous SVs.
Solving environment: failed with initial frozen solve. Retrying with flexible solve. Solving environment: / Found conflicts! Looking for incompatible packages. failed
UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:
Specifications:
Your python: python=3.6.2
If python is on the left-most side of the chain, that's the version you've asked for. When python appears to the right, that indicates that the thing on the left is somehow not available for the python version you are constrained to. Note that conda will not change your python version to a different minor version unless you explicitly specify that.
The following specifications were found to be incompatible with each other:
Package python_abi conflicts for: boost=1.70.0 -> python[version='>=3.6,<3.7.0a0'] -> pip -> setuptools -> certifi[version='>=2016.09'] -> python_abi=2.7[build=_cp27mu] freebayes=1.3.1 -> python[version='>=2.7,<2.8.0a0'] -> pip -> setuptools -> certifi[version='>=2016.09'] -> python_abi=2.7[build=_cp27mu] python=3.6.2 -> pip -> setuptools -> python_abi[version='3.6|3.6.',build='_pypy36_pp73|_cp36m'] freebayes=1.3.1 -> python[version='>=2.7,<2.8.0a0'] -> pip -> setuptools -> python_abi[version='3.6.|3.7.',build='_cp37m|_cp36m'] freebayes=1.3.1 -> python[version='>=2.7,<2.8.0a0'] -> python_abi==3.6[build=_pypy36_pp73]
Makefile:16: recipe for target 'build/install_flags/env_tools_install' failed make[1]: [build/install_flags/env_tools_install] Error 1 make[1]: Leaving directory '/media/2/CORN/smrtsv2/dep/conda' Makefile:87: recipe for target 'install_flags/dep_conda_build' failed make: [install_flags/dep_conda_build] Error 2
######################### Can you list a working list of packages used or create a docker image of this apllication?