cov-lineages / pangolin

Software package for assigning SARS-CoV-2 genome sequences to global lineages.
GNU General Public License v3.0
428 stars 107 forks source link

iq-tree error #39

Closed varunshamanna closed 4 years ago

varunshamanna commented 4 years ago

'pangolin hCoV-19AustraliaNSW022020EPI_ISL_4089762020-01-22.fasta -o out -t 16 Found the snakefile The query file is /media/crl-kims/Data_Vol_3/Varun/covid-19/ncbi_india/all/hCoV-19AustraliaNSW022020EPI_ISL_4089762020-01-22.fasta Number of threads is 16 Job counts: count jobs 1 all 1 assign_lineages 1 decrypt_aln 1 pass_query_hash 4 Job counts: count jobs 1 decrypt_aln 1 Job counts: count jobs 1 pass_query_hash 1 2 hashed sequences written Decrypted 261 sequences Job counts: count jobs 1 assign_lineages 1 Passing 1 into processing pipeline. snakemake --nolock --snakefile /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/assign_query_lineage.smk --configfile /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/../config.yaml --config query_sequences=tax1tax outdir=out query_fasta=out/temp/query.fasta representative_aln=out/temp/anonymised.aln.fasta guide_tree=/home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/../data/anonymised.aln.fasta.treefile key=out/temp/query_key.csv --cores 16 Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 16 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 assign_lineage 1 expand_query_fasta 1 gather_reports 1 iqtree_with_guide_tree 1 profile_align_query 1 to_nexus 7

[Tue Apr 28 11:19:53 2020] rule expand_query_fasta: input: out/temp/query.fasta output: out/temp/expanded_query/tax1tax.fasta jobid: 6

Job counts: count jobs 1 expand_query_fasta 1 [Tue Apr 28 11:19:54 2020] Finished job 6. 1 of 7 steps (14%) done

[Tue Apr 28 11:19:54 2020] rule profile_align_query: input: out/temp/anonymised.aln.fasta, out/temp/expanded_query/tax1tax.fasta output: out/temp/query_alignments/tax1tax.aln.fasta jobid: 5 wildcards: query=tax1tax

tbitr = 0, tbrweight = 3, tbweight = 0 ####### in galn file1 = out/temp/anonymised.aln.fasta file2 = out/temp/expanded_query/tax1tax.fasta generating a scoring matrix for nucleotide (dist=200) ... done Constructing dendrogram ... done. 262 GroupAglin.. group-to-group 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 / 262 17052203.752760

mafft-profile (nuc) Version 7.464 alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0 1 thread(s)

Removing temporary output file out/temp/expanded_query/tax1tax.fasta. [Tue Apr 28 11:19:55 2020] Finished job 5. 2 of 7 steps (29%) done

[Tue Apr 28 11:19:55 2020] rule iqtree_with_guide_tree: input: out/temp/query_alignments/tax1tax.aln.fasta, /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/../data/anonymised.aln.fasta.treefile output: out/temp/query_alignments/tax1tax.aln.fasta.treefile, out/temp/query_alignments/tax1tax.aln.fasta.parstree, out/temp/query_alignments/tax1tax.aln.fasta.splits.nex, out/temp/query_alignments/tax1tax.aln.fasta.contree, out/temp/query_alignments/tax1tax.aln.fasta.log, out/temp/query_alignments/tax1tax.aln.fasta.ckp.gz, out/temp/query_alignments/tax1tax.aln.fasta.iqtree jobid: 4 wildcards: query=tax1tax

Job counts: count jobs 1 iqtree_with_guide_tree 1 For AU test please specify number of bootstrap replicates via -zb option [Tue Apr 28 11:19:56 2020] Error in rule iqtree_with_guide_tree: jobid: 0 output: out/temp/query_alignments/tax1tax.aln.fasta.treefile, out/temp/query_alignments/tax1tax.aln.fasta.parstree, out/temp/query_alignments/tax1tax.aln.fasta.splits.nex, out/temp/query_alignments/tax1tax.aln.fasta.contree, out/temp/query_alignments/tax1tax.aln.fasta.log, out/temp/query_alignments/tax1tax.aln.fasta.ckp.gz, out/temp/query_alignments/tax1tax.aln.fasta.iqtree

RuleException: CalledProcessError in line 50 of /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/assign_query_lineage.smk: Command 'set -euo pipefail; iqtree -s out/temp/query_alignments/tax1tax.aln.fasta -bb 1000 -au -alrt 1000 -m HKY -g /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/../data/anonymised.aln.fasta.treefile -quiet -o 'outgroup_A'' returned non-zero exit status 2. File "/home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/assign_query_lineage.smk", line 50, in __rule_iqtree_with_guide_tree File "/home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/concurrent/futures/thread.py", line 56, in run Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /media/crl-kims/Data_Vol_3/Varun/covid-19/ncbi_india/all/.snakemake/log/2020-04-28T111953.703347.snakemake.log [Tue Apr 28 11:19:56 2020] Error in rule assign_lineages: jobid: 0 output: out/lineage_report.csv

RuleException: CalledProcessError in line 68 of /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/assign_query_file.smk: Command 'set -euo pipefail; snakemake --nolock --snakefile /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/assign_query_lineage.smk --configfile /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/../config.yaml --config query_sequences=tax1tax outdir=out query_fasta=out/temp/query.fasta representative_aln=out/temp/anonymised.aln.fasta guide_tree=/home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/../data/anonymised.aln.fasta.treefile key=out/temp/query_key.csv --cores 16' returned non-zero exit status 1. File "/home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/assign_query_file.smk", line 68, in __rule_assign_lineages File "/home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/concurrent/futures/thread.py", line 56, in run Exiting because a job execution failed. Look above for error message'

Please help me with the error

jillianhammond commented 4 years ago

I am running into the same issue :)

aineniamh commented 4 years ago

Okay, that's a new one! Could you try checking iqtree -v and maybe let me know what happens when you run

iqtree -s out/temp/query_alignments/tax1tax.aln.fasta -bb 1000 
-au -alrt 1000 -m HKY 
-g /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/../data/anonymised.aln.fasta.treefile 
-o 'outgroup_A'

I've not been able to replicate the error on my side so a bit more info will help figure this out. Is there anything unusual about the query fasta that has been input? Any chance it contains a header with an empty sequence line?

I'll work today on adding in some more informative error messages.

varunshamanna commented 4 years ago

Thank you for the reply,

I am having iqtree 2.0.3

and I ran the command you sent

itree -s out/temp/query_alignments/tax1tax.aln.fasta -bb 1000 -au -alrt 1000 -m HKY -g /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/../data/anonymised.aln.fasta.treefile -o 'outgroup_A'

I got this output

'For AU test please specify the number of bootstrap replicates via -zb option'

So then I changed -bb to -zb, I got the tree output

I tried using different file also but no luck

Thanking You,

Regards, Varun Shamanna

On Tue, Apr 28, 2020 at 3:19 PM aineniamh notifications@github.com wrote:

Okay, that's a new one! Could you try checking iqtree -v and maybe let me know what happens when you run

iqtree -s out/temp/query_alignments/tax1tax.aln.fasta -bb 1000 -au -alrt 1000 -m HKY -g /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/../data/anonymised.aln.fasta.treefile -o 'outgroup_A'

I've not been able to replicate the error on my side so a bit more info will help figure this out. Is there anything unusual about the query fasta that has been input? Any chance it contains a header with an empty sequence line?

I'll work today on adding in some more informative error messages.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hCoV-2019/pangolin/issues/39#issuecomment-620500830, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKJK2JGU4BXO4YDQ5JUNYIDRO2REHANCNFSM4MSQSQ2A .

aineniamh commented 4 years ago

Thanks Varun, this seems to be a difference between iqtree 1 and 2 I didn't realise existed. For the moment, I'll change the environment file to explicitly need iqtree 1 (I'm running 1.6.12). If you've already got the environment setup, if you conda uninstall iqtree and then conda install iqtree=1.6.12 that should solve this problem.

I'll switch everything over to iqtree2 more long term when I look into the differences (last I checked iqtree2 wasn't on conda, so have been working from iqtree1).

aineniamh commented 4 years ago

Let me know if this solves it and I will close the issue.

varunshamanna commented 4 years ago

Thank you for the update, I downgraded the iqtree and its working perfectly. Thank you for you your time.

You can close the issue.

Thanking You,

Regards,

Varun Shamanna Bioinformatician, CRL-KIMS, varunshamanna4@gmail.com +91-8123341361

On Tue, 28 Apr, 2020, 3:47 pm aineniamh, notifications@github.com wrote:

Thanks Varun, this seems to be a difference between iqtree 1 and 2 I didn't realise existed. For the moment, I'll change the environment file to explicitly need iqtree 1 (I'm running 1.6.12). If you've already got the environment setup, if you conda uninstall iqtree and then conda install iqtree=1.6.12 that should solve this problem.

I'll switch everything over to iqtree2 more long term when I look into the differences (last I checked iqtree2 wasn't on conda, so have been working from iqtree1).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hCoV-2019/pangolin/issues/39#issuecomment-620514759, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKJK2JGUU6ANY47B5KC5LX3RO2UKTANCNFSM4MSQSQ2A .

jillianhammond commented 4 years ago

Hey Aine, Thanks for your help. The software runs for me however some of the sequences with assigned lineages have a bootstrap value of 0% despite being marked as success. I have copy and pasted some of the output to show you. Do you have suggestions?

taxon lineage SH-alrt UFbootstrap status
nCoV-012 A.1 0 0 success
nCoV-015 A.1 0 0 success
nCoV-027 A.1 0 0 success
nCoV-036 A.1 100 78 success
nCoV-017 A.2 100 98 success
nCoV-018 A.2 100 98 success
nCoV-020 A.2 100 99 success
aineniamh commented 4 years ago

No problem Jillian. I only just added in the status column yesterday and it refers to whether the sequences have passed the N-content and length requirements for assignment attempt. I realise now that might be slightly confusing as it doesn't relate to the confidence of the assignment (which the alrt and bootstrap should do) just whether or not the sequence passed QC. I'll revise this to return something more clear. Perhaps something like this? Status Passed qc Fail

jillianhammond commented 4 years ago

Thanks again Aine! Yes I think that would be more clear! :) I have another question I am finding from the output that about 50-70% of my sequences are returning a bootstrap of 0% and if I rerun pangolin on the same fasta file the sequences returning 0% change (the lineages returned are the same each time, however). I am even getting 0% on the reference genome. The sequences I'm using do not contain any N's and are close to full length. Is this expected or perhaps an error on my end? Here is a copy and paste of some of the output from three runs of the same sequences. Thank you again for your help

nCoV-012_A A.1 0 0 success
nCoV-015_A A.1 0 0 success
nCoV-027_A A.1 0 0 success
nCoV-036_A A.1 0 0 success
nCoV-017_A A.2 0 0 success
nCoV-018_A A.2 0 0 success
nCoV-020_A A.2 0 0 success
nCoV-022_A A.2 100 99 success
taxon lineage SH-alrt UFbootstrap status
nCoV-012 A.1 0 0 success
nCoV-015 A.1 0 0 success
nCoV-027 A.1 0 0 success
nCoV-036 A.1 100 78 success
nCoV-017 A.2 100 98 success
nCoV-018 A.2 100 98 success
nCoV-020 A.2 100 99 success
nCoV-022 A.2 0 0 success
taxon lineage SH-alrt UFbootstrap status
nCoV-012 A.1 0 0 success
nCoV-015 A.1 100 93 success
nCoV-027 A.1 0 0 success
nCoV-036 A.1 0 0 success
nCoV-017 A.2 100 99 success
nCoV-018 A.2 100 98 success
nCoV-020 A.2 0 0 success
nCoV-022 A.2 100 98 success
aineniamh commented 4 years ago

So this was a bug, thanks for catching it! When I updated the guide tree I added in the SH-alrt stat to that too. iqtree keeps these legacy alrt and bootstrap values hanging around (apparently it's a "feature"), and I hadn't accounted for that. I've fixed this now, so it should be back running as normal. Hope this helps, thanks for that!