chrisquince / DESMAN

De novo Extraction of Strains from MetAgeNomes
Other
69 stars 22 forks source link

ERROR in estimateStrainCountDesman #39

Open osvatic opened 5 years ago

osvatic commented 5 years ago

I am receiveing an error when trying to run the test data:

N E X T F L O W ~ version 18.10.1 Launching desmanflow2.nf [amazing_wescoff] - revision: ab90ebff83 [seed#1234-+-alpha#0.01-+-xfold#10.wgs.8, [/scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.8.r1.fq.gz, /scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.8.r2.fq.gz]] [seed#1234-+-alpha#0.01-+-xfold#10.wgs.3, [/scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.3.r1.fq.gz, /scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.3.r2.fq.gz]] [seed#1234-+-alpha#0.01-+-xfold#10.wgs.10, [/scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.10.r1.fq.gz, /scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.10.r2.fq.gz]] [seed#1234-+-alpha#0.01-+-xfold#10.wgs.5, [/scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.5.r1.fq.gz, /scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.5.r2.fq.gz]] [seed#1234-+-alpha#0.01-+-xfold#10.wgs.12, [/scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.12.r1.fq.gz, /scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.12.r2.fq.gz]] [seed#1234-+-alpha#0.01-+-xfold#10.wgs.16, [/scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.16.r1.fq.gz, /scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.16.r2.fq.gz]] [seed#1234-+-alpha#0.01-+-xfold#10.wgs.13, [/scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.13.r1.fq.gz, /scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.13.r2.fq.gz]] [seed#1234-+-alpha#0.01-+-xfold#10.wgs.4, [/scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.4.r1.fq.gz, /scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.4.r2.fq.gz]] [seed#1234-+-alpha#0.01-+-xfold#10.wgs.15, [/scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.15.r1.fq.gz, /scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.15.r2.fq.gz]] [seed#1234-+-alpha#0.01-+-xfold#10.wgs.2, [/scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.2.r1.fq.gz, /scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.2.r2.fq.gz]] [seed#1234-+-alpha#0.01-+-xfold#10.wgs.1, [/scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.1.r1.fq.gz, /scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.1.r2.fq.gz]] [seed#1234-+-alpha#0.01-+-xfold#10.wgs.9, [/scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.9.r1.fq.gz, /scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.9.r2.fq.gz]] [seed#1234-+-alpha#0.01-+-xfold#10.wgs.7, [/scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.7.r1.fq.gz, /scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.7.r2.fq.gz]] [seed#1234-+-alpha#0.01-+-xfold#10.wgs.6, [/scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.6.r1.fq.gz, /scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.6.r2.fq.gz]] [seed#1234-+-alpha#0.01-+-xfold#10.wgs.14, [/scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.14.r1.fq.gz, /scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.14.r2.fq.gz]] [seed#1234-+-alpha#0.01-+-xfold#10.wgs.11, [/scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.11.r1.fq.gz, /scratch/osvatic/desman_test/testdata/fastq/seed#1234-+-alpha#0.01-+-xfold#10.wgs.11.r2.fq.gz]] [warm up] executor > local [0f/055856] Submitted process > mapReads (8) [d7/cb6fbc] Submitted process > mapReads (5) [ed/1c56d9] Submitted process > mapReads (2) [c2/d76747] Submitted process > mapReads (6) [bb/afd761] Submitted process > mapReads (9) [63/d98564] Submitted process > mapReads (11) [d4/ffe544] Submitted process > mapReads (1) [56/562c1b] Submitted process > mapReads (12) [75/1fd0d7] Submitted process > mapReads (3) [1f/c80103] Submitted process > mapReads (7) [ab/e42867] Submitted process > mapReads (4) [7c/15edca] Submitted process > mapReads (14) [73/b99e00] Submitted process > findEliteGenes [8b/6ede47] Submitted process > mapReads (10) [d2/081f85] Submitted process > mapReads (13) [cc/2df8c8] Submitted process > mapReads (16) [ea/ea2419] Submitted process > mapReads (15) [74/d74a4f] Submitted process > elitePileups (5) [8d/8a2e4d] Submitted process > elitePileups (1) [60/392d49] Submitted process > elitePileups (6) [fa/fc7da3] Submitted process > elitePileups (7) [5e/edb667] Submitted process > elitePileups (2) [b7/960757] Submitted process > elitePileups (8) [e7/7b0874] Submitted process > elitePileups (4) [90/299a52] Submitted process > elitePileups (9) [b1/f68870] Submitted process > elitePileups (11) [e7/e82377] Submitted process > elitePileups (10) [54/e30283] Submitted process > elitePileups (13) [c6/f3d4a3] Submitted process > elitePileups (12) [93/88104a] Submitted process > elitePileups (14) [b2/67b85c] Submitted process > elitePileups (3) [25/acac04] Submitted process > elitePileups (15) [d0/62439e] Submitted process > elitePileups (16) [ac/a4c895] Submitted process > callEliteVariants [d7/64455e] Submitted process > estimateStrainCountDesman (5) [c4/c95347] Submitted process > estimateStrainCountDesman (4) [ec/706e46] Submitted process > estimateStrainCountDesman (1) [7f/1070a2] Submitted process > estimateStrainCountDesman (2) [77/0857c4] Submitted process > estimateStrainCountDesman (7) [af/93ff7d] Submitted process > estimateStrainCountDesman (3) [39/7927af] Submitted process > estimateStrainCountDesman (11) [e0/1f7eea] Submitted process > estimateStrainCountDesman (12) [e3/3234f1] Submitted process > estimateStrainCountDesman (6) ERROR ~ Error executing process > 'estimateStrainCountDesman (11)'

Caused by: Process estimateStrainCountDesman (11) terminated with an error exit status (132)

Command executed:

/scratch/osvatic/desman_test/DESMAN/bin/desman outputsel_var.csv -e outputtran_df.csv -o cluster_1_2 -r 1000 -i 100 -g 1 -s 2 > cluster_1_2.out cp */fit.txt fit_1_2.txt

Command exit status: 132

Command output: (empty)

Command error: .command.sh: line 2: 19559 Illegal instruction /scratch/osvatic/desman_test/DESMAN/bin/desman outputsel_var.csv -e outputtran_df.csv -o cluster_1_2 -r 1000 -i 100 -g 1 -s 2 > cluster_1_2.out

Work dir: /scratch/osvatic/desman_test/work/39/7927afa0fb457de42739c9f17df4c2

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

-- Check '.nextflow.log' file for details WARN: Killing pending tasks (8)

I have already done the workaround from issue #28. Do you have any insights into the error above?

Jay

koadman commented 5 years ago

Hi Jay, it looks like the desman binary has been built for a newer CPU model than what's in the machine you're using. I can think of a couple possible solutions. One would be to build the desman binary yourself on that machine, to ensure it gets built with a supported instruction set. Another would be to move to a machine with newer CPUs.

osvatic commented 5 years ago

I ran this command on a newer CPU. It made it to the "straintigs" process. How long should it take the straintigs step to work on the test data set? Mine has been running for 4 hours and has not finished.

Is there a way to multi-thread this aspect to speed it up?

osvatic commented 5 years ago

After ~5 hours of running "straintigs" failed. Here is the error:

[43/056715] Submitted process > elitePileups (16) [b7/0358c0] Submitted process > elitePileups (14) [1c/5b9ba5] Submitted process > elitePileups (11) [97/54e37f] Submitted process > callEliteVariants [c3/421fb7] Submitted process > desman [29/c27461] Submitted process > straintigs ERROR ~ Error executing process > 'straintigs'

Caused by: Process straintigs terminated with an error exit status (1)

Command executed:

/scratch/osvatic/desman_test/DESMAN/scripts/Lengths.py -i species.fa > species_contigs.bed perl -p -i -e "s/ / 1 /g" species_contigs.bed

samtools faidx assembly.fa for file in *.bam do bname=basename $file .bam samtools index $file samtools mpileup -l species_contigs.bed -f assembly.fa $file > $bname.pileup
done

/scratch/osvatic/desman_test/DESMAN/scripts/pileups_to_freq_table.py assembly.fa .pileup contigfreqs.csv rm .pileup /scratch/osvatic/desman_test/DESMAN/desman/Variant_Filter.py contigfreqs.csv -m 0.0 -v 0.03 /scratch/osvatic/desman_test/DESMAN/scripts/CalcGeneCov.py contigfreqs.csv > contig_cov.csv

cut -f 1 elite.bed | sort | uniq > core_genes.txt /scratch/osvatic/desman_test/DESMAN/scripts/CalcDelta.py contig_cov.csv core_genes.txt cluster_core /scratch/osvatic/desman_test/DESMAN/bin/desman outputsel_var.csv -e outputtran_df.csv -o straincluster -r 1000 -i 100 -g 4 /scratch/osvatic/desman_test/DESMAN/desman/GeneAssign.py cluster_coremean_sd_df.csv straincluster/Gamma_star.csv contig_cov.csv straincluster/Eta_star.csv -m 20 -v outputsel_var.csv -o straincluster --assign_tau > cluster.cout /scratch/osvatic/desman_test/DESMAN/scripts/write_strain_fasta.py species.fa straincluster_tau_star.csv strainclusteretaD_df.csv straincluster

Command exit status: 1

Command output: (empty)

Command error: /scratch/osvatic/desman_test/DESMAN/scripts/Lengths.py:17: DeprecationWarning: 'U' mode is deprecated handle = open(options.ifilename, "rU") [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files /scratch/osvatic/desman_test/DESMAN/desman/Variant_Filter.py:74: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead. variants_matrix = variants.as_matrix() /scratch/osvatic/desman_test/DESMAN/scripts/CalcGeneCov.py:64: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead. gene_freq_matrix = gene_freqs.as_matrix() Up and running. Check straincluster/log_file.txt for progress /apps/python3/3.7.0/lib/python3.7/site-packages/desman-2.1.1-py3.7-linux-x86_64.egg/desman/Variant_Filter.py:74: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead. variants_matrix = variants.as_matrix() /scratch/osvatic/desman_test/DESMAN/bin/desman:118: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead. variant_Filter.eta = eta_df.as_matrix() Traceback (most recent call last): File "/scratch/osvatic/desman_test/DESMAN/desman/GeneAssign.py", line 11, in import desman.Eta_Sampler as es ImportError: No module named desman.Eta_Sampler

Work dir: /scratch/osvatic/desman_test/work/29/c2746108fc14251c94cff2a903ed35

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

-- Check '.nextflow.log' file for details

This was while running the command: "/scratch/osvatic/desman_test/desmanflow2.nf --speciescontigs=testdata/species_contigs.txt --assembly=testdata/testasm.fa --inputreads=testdata/fastq -resume --straincount=4" desmanflow2.nf is the fixed script using the workaround mentioned before. Any idea how to fix this?

koadman commented 5 years ago

Hope you can excuse the slight delay in my response - was offline for a few days. It looks to me like there's a python path problem. maybe you could try setting PYTHONPATH=/scratch/osvatic/desman_test/DESMAN/ and then go into the working directory /scratch/osvatic/desman_test/work/29/c2746108fc14251c94cff2a903ed35 and run the .command.sh script from there? Unfortunately it will take another 5 hours or so before you know if it's working. Alternatively if you don't want to wait you might be able to go into that directory and run the last two steps manually:

export PYTHONPATH=/scratch/osvatic/desman_test/DESMAN/
cd /scratch/osvatic/desman_test/work/29/c2746108fc14251c94cff2a903ed35
/scratch/osvatic/desman_test/DESMAN/desman/GeneAssign.py cluster_coremean_sd_df.csv straincluster/Gamma_star.csv contig_cov.csv straincluster/Eta_star.csv -m 20 -v outputsel_var.csv -o straincluster --assign_tau > cluster.cout
/scratch/osvatic/desman_test/DESMAN/scripts/write_strain_fasta.py species.fa straincluster_tau_star.csv strainclusteretaD_df.csv straincluster

I'm not totally sure why this happened but I notice from the logs that you've got another copy of desman installed elsewhere on your system by a package manager so perhaps that's contributing to the path problem.

osvatic commented 5 years ago

Thanks for the response. I think I was able to fix it in an alternative way. GeneAssign.py is the only script that doesn't call python3 so I changed it to make sure that is calls python3. This fixed the issue. Do you think that has caused any other issues? The test data worked perfectly after that.

chrisquince commented 5 years ago

Hi,

No that should be alright, the latest version of GeneAssign.py is Python3.

Best, Chris