Closed BiodivGenomic closed 5 years ago
Hi,
wgd ksd
should take into account this issue. If you run wgd with the -v debug
flag you'll see the following when a family with only two members is encountered:
2019-02-04 17:46:55: DEBUG PhyML breaks with only two genes, do ALC instead.
2019-02-04 17:46:55: DEBUG Distance will be in Ks units!
2019-02-04 17:46:55: DEBUG Performing average linkage clustering on Ks values.
2019-02-04 17:46:55: DEBUG Clustering used for weighting:
[[0. 1. 0. 2.]]
2019-02-04 17:46:55: DEBUG Paralog1 Paralog2 Family ... AlignmentLength AlignmentLengthStripped AlignmentCoverage
AT5G13590.1__AT5G13590.2 AT5G13590.1 AT5G13590.2 GF_000003 ... 3504 3504 1.0
In other words, analyses should not stall when encountering only two family members.
Here is an example data set to show a MWE, sample.mcl
:
AT5G24210.1 AT5G24230.3 AT4G10955.1
AT1G01080.3 AT5G53680.1
AT5G13590.1 AT5G13590.2
AT1G02610.1
make sure they are tab separated (sed -i 's/ /\t/g' sample.mcl
is your friend after copying)
sample.fasta
:
>AT1G01080.3
ATGGCGGCCTCCTGCTTCGCAATTCCCTTATCTTCTTCTTCTCGATCGTCTCACAATGCAATTCCCAAATACAAAACCCTAATCTCTTCTTCTTCTTACTCTTACTTAGAATCTCTGAAACTTCAATTCTCTTCTTCCAATTCTTTTCATCACTCTTCTCTTTCTCGTCCCTTTGTAGCTCAACCACTTCAAATCAAGGTCTCTTCTTCAGAATTATCAGTTCTCGATGAAGAAAAAGAAGAAGAAGTAGTTAAAGGAGAAGCAGAACCCAATAAAGATAGTGTCGTCTCCAAAGCAGAACCAGTAAAGAAACCGAGACCTTGCGAGCTCTACGTGTGTAATATCCCTAGAAGCTACGACATTGCTCAGCTTCTTGACATGTTTCAGCCTTTTGGAACTGTAATCTCTGTAGAGGTATCGCGAAATCCTCAGACGGGAGAGAGCCGTGGAAGCGGGTACGTGACAATGGGTTCTATAAACTCTGCCAAAATCGCCATTGCTTCTCTTGATGGAACAGAAGTAGGTGGTCGGGAAATGCGGGTTAGGTACTCTGTTGACATGAATCCAGGAACAAGAAGAAACCCTGAAGTCTTGAACTCAACTCCAAAGAAGATTCTGATGTACGAAAGCCAACACAAGGTCTATGTCGGAAATCTCCCTTGGTTCACACAGCCTGATGGTTTGAGAAACCACTTTAGCAAGTTTGGCACAATCGTAAGCACGAGAGTGTTACATGATCGTAAGACCGGGAGAAACAGAGTCTTTGCCTTTCTTTCTTTTACAAGCGGTGAAGAACGTGATGCGGCTTTATCATTCAATGGAACAGTTAAGTTCGTTATCCATAAAAAGAATCTTGCTTGA
>AT1G02610.1
ATGGGAGATGTAGTTTTGTTCATAGATGAAACATATTTGAAATCGAGTTTTAATCGCTGTAGAATCTGTCACGAAGAAGAAGCTGAGAGCTACTTTGAAGCTCCTTGTTCTTGTTCAGGAACCATCAAGTTCGCTCACAGAGATTGCATACAACGATGGTGTGATGAGAAAGGAAACACAATTTGTGAAATTTGTCTCCAGGAGTATAAACCTGGATACACCACAACTTCAAAACCATCTCGATTTATTGAAACAGCAGTCACAATCAGAGATAATTTACACATAATGAGAAGAGAAAATGGAAGAAGAAGAAGAAATAGAAGATTAGTGAATAGAGAAGAATCAGATTTTCAAGAATGCAACTCTGGTGTTGATAGAGGCGCCTCTTGTTGTAGATACTTGGCTCTCATTTTTTCGGTTATTTTGTTGATAAAGCATGCATTTGATGCGGTTTATGGAACTGAAGAGTATCCCTATACAATATTCACGGTCTTAACATTGAAGGCTATAGGCATACTATTACCAATGCTCGTTATCATTCGAACCATCACAGCTATTCAAAGGAGTCTCCGATATCAAATTCTCGAATCAGAAGAAGATACATTGAGCTCTGAAGAAGAAGATCATGGTTTGGAGGAGGAAGAGCAACAACAACATATAGCTTGA
>AT5G24210.1
ATGGGAAACTTAAAAAAATCTACACGCAGTGACGAGTTAAGCCGTTCTGGTCCCCCTCAAATTCCAAATCCTGACTGGAACAATTTGTATCACCGAACCACAGTGGCCTCATGTTTGGTGCAAGGAGTTTACGCAAAGGAAAGAGACAGGGAAAACAACCGAAATGGTTCCGAGTCATTAGCCACACCTTGGTGGAAGAGTTTCAACTTCACTTTAGATGAAAGTGAAATCCTATATGACGCATTTGACGGCTCCATATACGGTGCTGTCTTCCAAAACATGATCAATTATGAGAATACCCCGAACTCGATAGTAGTACCTCCGCGTTACGTGATTGCGTTACGGGGAACTGTCCCAAGTGATGTGAGTGATTGGATACATAACAGCCGTATTGTACTCGAGAAACTCCATGGCGGGGGTAAGCATATGCATGTCATTAGAAAAATCTATTCTTTGGTGGCCAAACACGGAAACACAGCTGTCTGGATCGCTGGACACTCTTTAGGAGCTGGCCTGGCACTACTCGCGGGAAAGGACATGGCCATGTCTGGACTCCCTGTTGAGGCTTACATCTTCAACCCACCTATCTCCTTGATTCCTCTAGAGCAGTGCGGTTACAATCACGAACTTAATTTTGTGTATCGACTCACCAGGGATCTCTTCAAAGCTGGCATAGCCAAAGTCGTAGACCTTGATGAGGGTCAAGAGGGTCCACGATATAAGAACTTAGCTTCTTGGAGACCTCATTTGTTTGTGAACCAATCTGATGTAATATGCTCAGAATATATTGGTTATTTCAATCACGTAGTCACTATGACGGAGGCGGGACTCGGTGAGATTTCGAGGTTGGCTAGTGGATACTCAGTTAGGCGTATGTTATTCGGAGACGGAGAAAATTGGTCCTCGTCTTCTACACCAGATCATCTTCATTTTCTTCCGTCGGCCTTTATGATTGTAAACAAGACTGAAGCGTCGGAGTTTTATAATAAACATGGGATTCATCAATGGTGGAATCATATGCTTAAACAATCTACAACGTTTAGTTCATACTAG
>AT5G53680.1
ATGTCTCACCACCACCAAAACTTCGATACAACATTCACAAAGATATACGTGGGGGGTTTGCCTTGGACAACAAGAAAGGAAGGCTTGATAAACTTCTTCAAACGTTTTGGTGAAATCATCCATGTGAACGTCGTTTGTGATAGAGAAACAGATCGATCACAAGGATATGGCTTCGTCACGTTTAAAGACGCTGAATCTGCAACAAGAGCTTGCAAGGATCCGAATCCGACTATTGAAGGACGAATAACTAATTGCAAACTCGCTTTCGTTGGTGCTAAAGTTAAACCTAACCAATCCCAACCTTCAAATTTGCCTCAATTATTACCTAGGTATGATCCGCAATATAATCCGCGGTATGATCCGATGTCTTACCAACAAAACCGTATGGCTAACAACACCAACAATGGTATTGGCAGCATCCAACTACAAACGTTGTTAACGGAGAATCGAGCAGCTCACAGGCTACGGGAACGCAGCCAGAGTTTCTTTCGACACCGGGATCTTCGCTAA
>AT4G10955.1
ATGATGATTAGTGAAAGAGATGATTTTAGTCTCACTGGACCATTACACTTAACATCTATAGATTGGGCTAATGAACATCATCGACGATCCGTAGCTGGATCTTTGGTTCAAGGAATCTATGTAGCTGAGCGTGACCGTCAGCTACAAAGAGAAGGTCCTGAGTTAGCTTTATCTCCAATATGGTCTGAGTTTTTCCATTTCCGCCTCATTCGTAAGTTTGTCGATGACGCGGATAACTCTATCTTCGGAGGAATCTATGAGTACAAACTGCCGCAACAGCTCTCTCAAACCGTCAAATCAATGGAATTTAGTCCACGTTTTGTGATTGCTTTCAGAGGAACGGTTACAAAAGTGGACTCCATTTCCCGTGACATCGAGCATGACATCCATGTTATTAGAAACGGGCTTCACACGACAACACGGTTTGAGATAGCTATCCAAGCAGTGAGAAACATTGTTGCTTCGGTTGGTGGTTCTAGTGTTTGGCTTGCTGGTCATTCTCTTGGTGCATCTATGGCATTACTTACCGGGAAAACCATCGCTAGAACCGGGTTTTTTCCTGAGTGTTTCGCATTCAATCCGCCGTTTTTGTCTGCCCCTATCGAAAAAATTAAGGATAAGAGGATTAAACATGGGATACGCATTGCAGGCAGTGTGATCACAGCTGGACTTGCTCTAGCCAAAAAAGCCACCCAACACTACAGCCAAAACGACCGTGCATTACCCGCACCTCCTGATCCATTTGAAGCTTTATCCGACTGGTTCCCGCGGCTGTATGTCAACCCTGGTGACCACTTATGCTCAGAGTATGTTGGTTACTTTGAGCACCGAAACAAGATGGAAGAAATCGGGATTGGGTTTGTAGAGCGGGTAGCGACGCAGCACTCGTTGGGCGGTATGCTGTTAGGAGGACAAGAGCCGGTACATCTGATTCCATCTTCGGTTTTGACGGTGAACTTAAGCTCCTCGAGAGATTTCAAACAAGCTCATGGGATTCATCAGTGGTGGAGGGAAGATAACAAGTTTGAGACTAAAGTTTACCAGTACAAATGA
>AT5G13590.1
ATGTCTGGAAGCCAAGAGCCTAGGATCAGACCATCTACATGGAGCTGCAGTGATATTCCAATCAAGAAGAGGAAGTACCTTGTTCAGCCGCAAATGGAAGAAGCTGTCTCCACTCAGATTCCACAACCTAATGAGCAAGGTGATACTAGGAGTGCTCATGCTGACGAAACTCAGAAAATGACTGGTCGAGAACCAACCTCTTCATTACCATCTGTTCCTGTGGGAATTTCTGGTAAAGGGAAGAGCATTGGGAACATAGTTTTTGACCAAACTAGAGTGAAATTTGAGAAGCCAAGTTCTCCAATTCACTCCAGTCCATTGGCAGGCTTTGACATCCCTTCTAGTTCTAACGTACTTGGCAGTTCAATCCATTTTCCTATGGGAAAGCTTCCTGTTGGTGCTGAACATGCTGGTCTTGTTGTCCCCTCAAATCAAACTCGGATGAAAGTAGAAAAAACTGTTCTTAAGACTCATGATATAGTCCGGAAGACAGGTGACAAGGAAACTCTCAGAGGAGAGTGTCAAACAGAAGCATCTTCTGGTGCTAAGACTGTTTCCTTACAGCTAAGTTGTAACACTAAAAACAATTCTCCATATTGGAAGAATGAAGAGCCTACAGAACTGAATTTGTCATTAAGCAAGGGAGTTTGTCCCGCTCATAACACAGATTCTACTTCTACCAAATCTGGCAACAGTGGCCTGAACAGAGAAAATTGGGATTTGAATACTACCATGGATGTTTGGGAAGATGCTCTAGATCGCACAAGTGGTGCATTCTTAAACAGTAACAGAAGTCTTCGTGACATAGAGAGATCAAGTTGTCGTGATACGACTGCTATTACAAAGTCTGTTTCTGAAAGACAGAAGGAAAGTGTAGGATTTAGTTCTCCTAAGGTGACGTTGATGCAGTTTGATAATCATGTTAATCCCACATGCTCACTTAGTCTAGGCCTCAGTTCATATCCTCCTATTGAGAAATCTCCTTCTCTACCAGCTACCACATCAGAGGCAAGAGCTGGGAATGTGTGTTCAGTGAACCTTAGGACTGTGAAGTCAGAAATCATTGAAGAGAGTGTTAGGCAGGCAACAGAGAGTACTCAAGTTTCTCCAATCGGGCTATCTATTAAAGGACTGAAACATGAGGGTATTGGCAGATTCAGCCAAGGAAATAGTCCCTCATTTGGCATTTTGAAGACAGTGGTTCCTATATCAATAAAGGCTGAGCCAAATACCTTCTCTCAATCAGAAGTTTTCAATAGGAAAGATGGAATGTTGAATCATCCCCATACCCCAATAATGCAATCAAATGAGATCCCTGATTTACCTACAAGTTCTACGCCATATCAGAAGGATAAATATTTACCTTGTTCAAATGGTATCAGCAATGCACCAATGCCCTTGAGTGGAATGACAATAATTCCAGGCGTTCAGAGTGATCCTGACTGTACATCAAAAGAAAATTCGGGCCAGAGTAGCAGTTTAGCTAATGGTAAATTACGCGAAGTGCTGAAACATGGTGGAGTTTACACGACTTATTCTGGTCATGGAGATCATAACCTCAATGCTTCAGGTGTGAATGTTACTTCCTTGACTGAAGAGAAAATACTAGATGATTGCAAGCCTTGTATATCGAAAGAACTCCCTTGTAATTCTCGTGGAACTGATGAACTTTCCAGAAATGATGAAGAGAAGATTACTTTACCTGGTAAGGAGCTAGAGGAACAGTTATACAGTTATGGGTTTGAATCAGATCGTGGTTATGATCTATCTAGAGTAATAAAGGAGCAAGTTGGCAAAAGAAATTTGTGCGATGACGGGAAGGTCCAAGGACCAGCTGCCGTTTTCACGGAAAGTAATGAGGTTGCACATCCTGAGTGTGGTGGTTCTGAAACTGAACAAAGGAACATTAATGTTCCATGCCATGTCCACTTTCATAATTCTAACCATGTGGAAGAAAAAGGGAGTCAACCTGCACTTCTTGGTTATACAGGTGAAACTGAAGGCCGGATAGTTCAGGATGGTGAAGGAACGTCAGGTGTCTCTACAGTGTCAGGCGGCATTGAAAACCCTGAAATAGTAGATAACAGTAGTCCAGTTTCACTCAAGGCAGAAATGTCTACTATTGACAATGATTCTCCTATGGAGTGCAGTGACGGTAGTCAGAGTCGAATTATAAACTTAACTCAGGTTAAATCTCCAGTTAAGGCACTAGATGCTTCAGGCAGCTTTGTGCCACCCCGAATGGAAAGAGATAGATTTCATGATTTCCCACTCGAACCGCGGGAATATACTTTCAGAGGGAGTGATGAATCCTGCAAATTCTCGCGTGAGAGGTACCATGGCAGAATTATGAGAAGCCCAAGGTTAAATTTCATACCTGACAGAAGGAGATTACCTGATAACACAGAAAGCAATCTGCATGACCAGGACACAAAAAAATTTGAGTTTGATAATCATGGAAACACTCGTCGGGGTGGTGCTTTTATGAGTAATTTTCAGAGAGGGAGACGGCCTGCAAATGATGGAGTTACACCATATGCTCACTCCTTTCCGAGAAGATCCCCTAGCTTTTCATATAATAGAGGACCAACAAATAAAGAGGATACATCTGCATTTCACGGATTTAGAGATGGTGAAAAATTCACAAGGGGATTACAATGCAACAACACAGAACCACTGTTTATGAATCACCAACGTCCATATCGAGGTCGGAGTGGTTTTGCTCGAGGACGAACAAAGTTTGTAAACAACCCCAAACGAGATTTTCCTGGATTTCGTTCACGATCTCCAGTTAGATCAAGAGAAAGATCAGATGGTTCATCCTCGTCTTTCAGGAATAGATCACAGGAAGAGTTCAGTGGGCATACAGACTTTTCTCATCGAAGATCACCCTCAGGTTACAAAGTGGAGAGGATGAGCTCGCCTGACCATTCTGGTTATTCAAGAGAAATGGTTGTCAGAAGACACAATTCTCCACCTTTCTCGCATAGACCATCGAATGCTGGAAGGGGCCGGGGTTATGCAAGGGGCCGAGGTTATGTAAGGGGTCGAGGTTATGGAAGAGATGGCAACTCATTTAGGAAACCATCTGATCATGTTGTACATAGAAACCATGGAAACATGAATAACTTGGATCCTCGAGAAAGGGTTGACTATAGTGATGATTTCTTTGAAGGTCAAATTCATTCTGAACGATTTGGTGTTGATGTTAATGCTGAGAGAAGACGATTTGGTTATAGACATGATGGTACCAGCAGCTCTTTTAGACCATCTTTTAACAATGATGGTTGTGCACCTACTAATGTAGAGAATGACCCTGATGCTGTGAGGTTCCAACAAGACCCTCGTATTAAAATTGAAGAACAAGGGAGTTTAATGGAAATTGATGGAGAAAATAAGAACTCAACTGAGAATGCATCTGGAAGAACTAAGAATATGGAAGAGGAAGAAACTTCAAAGAACAGTAAAATTTGGCAACCGGATGAGCTCGGTGGTGATGGTTTTTAA
>AT5G13590.2
ATGTCTGGAAGCCAAGAGCCTAGGATCAGACCATCTACATGGAGCTGCAGTGATATTCCAATCAAGAAGAGGAAGTACCTTGTTCAGCCGCAAATGGAAGAAGCTGTCTCCACTCAGATTCCACAACCTAATGAGCAAGGTGATACTAGGAGTGCTCATGCTGACGAAACTCAGAAAATGACTGGTCGAGAACCAACCTCTTCATTACCATCTGTTCCTGTGGGAATTTCTGGTAAAGGGAAGAGCATTGGGAACATAGTTTTTGACCAAACTAGAGTGAAATTTGAGAAGCCAAGTTCTCCAATTCACTCCAGTCCATTGGCAGGCTTTGACATCCCTTCTAGTTCTAACGTACTTGGCAGTTCAATCCATTTTCCTATGGGAAAGCTTCCTGTTGGTGCTGAACATGCTGGTCTTGTTGTCCCCTCAAATCAAACTCGGATGAAAGTAGAAAAAACTGTTCTTAAGACTCATGATATAGTCCGGAAGACAGGTGACAAGGAAACTCTCAGAGGAGAGTGTCAAACAGAAGCATCTTCTGGTGCTAAGACTGTTTCCTTACAGCTAAGTTGTAACACTAAAAACAATTCTCCATATTGGAAGAATGAAGAGCCTACAGAACTGAATTTGTCATTAAGCAAGGGAGTTTGTCCCGCTCATAACACAGATTCTACTTCTACCAAATCTGGCAACAGTGGCCTGAACAGAGAAAATTGGGATTTGAATACTACCATGGATGTTTGGGAAGATGCTCTAGATCGCACAAGTGGTGCATTCTTAAACAGTAACAGAAGTCTTCGTGACATAGAGAGATCAAGTTGTCGTGATACGACTGCTATTACAAAGTCTGTTTCTGAAAGACAGAAGGAAAGTGTAGGATTTAGTTCTCCTAAGGTGACGTTGATGCAGTTTGATAATCATGTTAATCCCACATGCTCACTTAGTCTAGGCCTCAGTTCATATCCTCCTATTGAGAAATCTCCTTCTCTACCAGCTACCACATCAGAGGCAAGAGCTGGGAATGTGTGTTCAGTGAACCTTAGGACTGTGAAGTCAGAAATCATTGAAGAGAGTGTTAGGCAGGCAACAGAGAGTACTCAAGTTTCTCCAATCGGGCTATCTATTAAAGGACTGAAACATGAGGGTATTGGCAGATTCAGCCAAGGAAATAGTCCCTCATTTGGCATTTTGAAGACAGTGGTTCCTATATCAATAAAGGCTGAGCCAAATACCTTCTCTCAATCAGAAGTTTTCAATAGGAAAGATGGAATGTTGAATCATCCCCATACCCCAATAATGCAATCAAATGAGATCCCTGATTTACCTACAAGTTCTACGCCATATCAGAAGGATAAATATTTACCTTGTTCAAATGGTATCAGCAATGCACCAATGCCCTTGAGTGGAATGACAATAATTCCAGGCGTTCAGAGTGATCCTGACTGTACATCAAAAGAAAATTCGGGCCAGAGTAGCAGTTTAGCTAATGGTAAATTACGCGAAGTGCTGAAACATGGTGGAGTTTACACGACTTATTCTGGTCATGGAGATCATAACCTCAATGCTTCAGGTGTGAATGTTACTTCCTTGACTGAAGAGAAAATACTAGATGATTGCAAGCCTTGTATATCGAAAGAACTCCCTTGTAATTCTCGTGGAACTGATGAACTTTCCAGAAATGATGAAGAGAAGATTACTTTACCTGGTAAGGAGCTAGAGGAACAGTTATACAGTTATGGGTTTGAATCAGATCGTGGTTATGATCTATCTAGAGTAATAAAGGAGCAAGTTGGCAAAAGAAATTTGTGCGATGACGGGAAGGTCCAAGGACCAGCTGCCGTTTTCACGGAAAGTAATGAGGTTGCACATCCTGAGTGTGGTGGTTCTGAAACTGAACAAAGGAACATTAATGTTCCATGCCATGTCCACTTTCATAATTCTAACCATGTGGAAGAAAAAGGGAGTCAACCTGCACTTCTTGGTTATACAGGTGAAACTGAAGGCCGGATAGTTCAGGATGGTGAAGGAACGTCAGGTGTCTCTACAGTGTCAGGCGGCATTGAAAACCCTGAAATAGTAGATAACAGTAGTCCAGTTTCACTCAAGGCAGAAATGTCTACTATTGACAATGATTCTCCTATGGAGTGCAGTGACGGTAGTCAGAGTCGAATTATAAACTTAACTCAGGTTAAATCTCCAGTTAAGGCACTAGATGCTTCAGGCAGCTTTGTGCCACCCCGAATGGAAAGAGATAGATTTCATGATTTCCCACTCGAACCGCGGGAATATACTTTCAGAGGGAGTGATGAATCCTGCAAATTCTCGCGTGAGAGGTACCATGGCAGAATTATGAGAAGCCCAAGGTTAAATTTCATACCTGACAGAAGGAGATTACCTGATAACACAGAAAGCAATCTGCATGACCAGGACACAAAAAAATTTGAGTTTGATAATCATGGAAACACTCGTCGGGGTGGTGCTTTTATGAGTAATTTTCAGAGAGGGAGACGGCCTGCAAATGATGGAGTTACACCATATGCTCACTCCTTTCCGAGAAGATCCCCTAGCTTTTCATATAATAGAGGACCAACAAATAAAGAGGATACATCTGCATTTCACGGATTTAGAGATGGTGAAAAATTCACAAGGGGATTACAATGCAACAACACAGAACCACTGTTTATGAATCACCAACGTCCATATCGAGGTCGGAGTGGTTTTGCTCGAGGACGAACAAAGTTTGTAAACAACCCCAAACGAGATTTTCCTGGATTTCGTTCACGATCTCCAGTTAGATCAAGAGAAAGATCAGATGGTTCATCCTCGTCTTTCAGGAATAGATCACAGGAAGAGTTCAGTGGGCATACAGACTTTTCTCATCGAAGATCACCCTCAGGTTACAAAGTGGAGAGGATGAGCTCGCCTGACCATTCTGGTTATTCAAGAGAAATGGTTGTCAGAAGACACAATTCTCCACCTTTCTCGCATAGACCATCGAATGCTGGAAGGGGCCGGGGTTATGCAAGGGGCCGAGGTTATGTAAGGGGTCGAGGTTATGGAAGAGATGGCAACTCATTTAGGAAACCATCTGATCATGTTGTACATAGAAACCATGGAAACATGAATAACTTGGATCCTCGAGAAAGGGTTGACTATAGTGATGATTTCTTTGAAGGTCAAATTCATTCTGAACGATTTGGTGTTGATGTTAATGCTGAGAGAAGACGATTTGGTTATAGACATGATGGTACCAGCAGCTCTTTTAGACCATCTTTTAACAATGATGGTTGTGCACCTACTAATGTAGAGAATGACCCTGATGCTGTGAGGTTCCAACAAGACCCTCGTATTAAAATTGAAGAACAAGGGAGTTTAATGGAAATTGATGGAGAAAATAAGAACTCAACTGAGAATGCATCTGGAAGAACTAAGAATATGGAAGAGGAAGAAACTTCAAAGAACAGTAAAATTTGGCAACCGGATGAGCTCGGTGGTGATGGTTTTTAA
>AT5G24230.3
ATGGAAGAAGAAGATGATGAGGTTATGGTCAGAGAGGGGCTAATGGCCTCTCAAAGAGAAATCTTCAGCATTTCTGGTCCAATCCATTTAACTTCCATTGATTGGAATAATTCTTATCATAGAACCTCGGTGGCATCATGTTTGGTACAAGCAGTGTACACATTGGAACGAGACAGACAACAAAACAGGATTGGCCTAAAGTCACAAGCCAATCATTGGTGGGAGTTTTTCAACTTCACTTTAGCCGAAACCCTAATCGACGACTCAGACGGATCTATATACGGCGCCGTTTTCGAATACAAACATTTCTTCTCCTACAATTACCATCACACCCCTCATTCGAAACCACCTCCTCGTCACGTGATTGCTTTCCGTGGCACGATCTTGAAACCGCACTCTCGGTCACGTGACCTTAAGCTCGACCTACGTTGCATCCGAGACTCTCTCCATGATAGCACTCGGTTCGTGCATGCCATTCAGGTTATTCAAAGTGCGGTGGCTAAAACTGGTAATGCAGCCGTGTGGCTCGCCGGACATTCTCTTGGAGCAGCCGTGGCTTTGCTTGCCGGGAAGATTATGACAAGGTCTGGTTTTCCTCTTGAGAGTTACTTATTCAATCCTCCTTTCTCGTCTATTCCGATAGAGAAGCTAGTGAAGAGTGAGAAGCTTAAACATGGGGTTCGATTCGCCGGAAGTCTTGTTAAAGCCGGAGTTGCCATCGCCGTTAAGGGTCGCCACCATAATAAGGGTCAAGAAGACGATTCGTTCATGAAGTTAGCATCATGGATACCATATTTGTATTTGAATCCGTTAGATACAATATGCTCAGAATACATTGGTTACTTCAAGCACAGAAACAAAATGTTTGAGATCGGAGCCGGTAAAATCGAAAGAATTGCTACGAGGAACTCACTTAGGAGTCTGTTGTCAGGAGGAGGAGGAGGAGGTTCATCTTCAGATTCTTCTTCAGAGCCTCTTCATCTTTTACCATCGGCATATATGACGATAAACGCTAGCAAATCGCCGAATTTTAAGAGAGCTCATGGGATTCATCAATGGTGGGATCCCATGTTTAATGGTGAATATGTTTTGCATCAGTTTAATAACTAA
command: wgd -v debug ksd sample.mcl sample.fasta --wm phyml -o sample.out
output:
AlignmentCoverage AlignmentIdentity AlignmentLength AlignmentLengthStripped Distance Family Ka Ks Node Omega Paralog1 Paralog2 WeightOutliersIncluded WeightOutliersExcluded
AT5G13590.1__AT5G13590.2 1.0 1.0 3504.0 3504.0 0.0 GF_000003 0.0 0.0 2.0 0.001 AT5G13590.1 AT5G13590.2 1.0 0.0
AT1G01080.3__AT5G53680.1 0.58537 0.38889 861.0 504.0 190.80626 GF_000002 0.9774 67.4602 2.0 0.0145 AT1G01080.3 AT5G53680.1 1.0 0.0
AT4G10955.1__AT5G24230.3 0.90237 0.577 1137.0 1026.0 1.10504 GF_000001 0.4267 2.1848 3.0 0.1953 AT5G24230.3 AT4G10955.1 1.0 1.0
AT5G24210.1__AT5G24230.3 0.91557 0.6196 1137.0 1041.0 1.31704 GF_000001 0.422 1.1156 4.0 0.3782 AT5G24210.1 AT5G24230.3 0.5 0.5
AT4G10955.1__AT5G24210.1 0.88127 0.48004 1137.0 1002.0 2.04701 GF_000001 0.6653 4.4183 4.0 0.1506 AT5G24210.1 AT4G10955.1 0.5 0.5
Can you verify if this works for you?
Note: The ALC (average linkage clustering) is of course not necessary, but it's a small hack to get the results in the same data structure with no overhead
Hi, I got an output, but shorter than what you showed...
I continue to explore my issue, and it seems to be more related to families with 3 members, when one is filtered, resulting in 2 sequences to be treated by PhyML (and thus after the ALC filter to be applied)...
For example, below is the part of the debug concerning a family with 3 members : results_3genes_family.txt
As I remember it was not this family that failed in the previous analysis (maybe the first step is not using exactly the same order each time ?), it seems difficult to just remove the problematic families...
Whoops, you are correct, my output was from a different test, I edited it.
I see you are running the analysis with the --pairwise
flag, that's not really necessary and I would suggest not using it (it's slower, and does not matter for the results, it's there for compatibility reasons with earlier analyses). I think the bug you found might be alleviated when using the family-wise approach (without specfying the --pairwise
flag). Can you test whether it works without the --pairwise
flag?
OK, I'm testing without the --pairwise flag.
Hi, the --pariwise flag doesn't seem to solve the issue... I ran the ksd step using different genomes, with different characteristics (number of genes, the way the genes are named, etc), and the 3 are stalled, each at a family with 3 genes (using multiprocessing, thus maybe the same than in the other issue apply...). Two are on my computer, the third one is on a distant cluster, thus I don't think it's computer-specific. The 3 are stalled ot the phyml step ("DEBUG Running PhyML: phyml -i ..."). the analysis is going faster with the reading of the mcl file (make sense, as the families are smaller and smaller), thus I don't think it's "normal" to have the phyml step for a family running for more than 2 days... I stopped one and ran it again, and it finished (I already restart several times the other, without improvement).
No that definitely does not sound good, and I haven't had these problems before myself. I will need some small example data set to try to reproduce this issue and do he debugging. Could you provide me a small set of families + sequences where you have this issue? Thanks
I checked the analyses that is running on my computer, and the last family entirely treated (with a fasta, fasta.msa and .ks files) is the last family with 3 members in the mcl file. Also, I'm running them now using fasttree instead of phyml, and the first one completed...
Hi, may I ask if you still have issues related to the above? If not I'd like to close this, if you do have issues I'd like to solve them.
closed due to no follow-up
Hi, I would like to know if the smallest multi-copies families can be an issue for wgd... I can imagine phyml will maybe run into trouble building a tree with only 2 sequences, but is wgd dealing with these families ? I ask this question because analyses seems to stall using phyml for the second step (ksd), each time at a gene family containing 2 members... I would like to exclude this issue before troubleshooting more :-)