arzwa / wgd

Python package and CLI for whole-genome duplication related analyses. This package is deprecated in favor of https://github.com/heche-psb/wgd.
http://wgd.readthedocs.io/en/latest/
GNU General Public License v3.0
80 stars 40 forks source link

result figures are empty #46

Open shiyi-pan opened 3 years ago

shiyi-pan commented 3 years ago

hi, I run the WGD like that,format.SoyC09.CDS.fasta and ormat.SoyC09.gff are my input file :

wgd mcl -n 8 --cds --mcl -s format.SoyC09.CDS.fasta -o SoyC09.CDS.out
wgd ksd --n_threads 8 --pairwise SoyC09.CDS.out/format.SoyC09.CDS.fasta.blast.tsv.mcl format.SoyC09.CDS.fasta
wgd syn  format.SoyC09.gff format.SoyC09.CDS.fasta
wgd kde wgd_ksd/format.SoyC09.CDS.fasta.ks.tsv
wgd mix wgd_ksd/format.SoyC09.CDS.fasta.ks.tsv

I don't find errors in log but the output figures like ks.svg and dotplot.svg are empty.coud't you help me fix this problem ? thank you very much . here is my log and I delete some rereat INFOs because it's too big:

2020-10-31 22:45:08: INFO   makeblastdb stdout: makeblastdb: 2.2.26+
Package: blast 2.2.26, build Feb  9 2012 16:01:46
2020-10-31 22:45:08: INFO   makeblastdb stderr: 
2020-10-31 22:45:08: INFO   blastp stdout: blastp: 2.2.26+
Package: blast 2.2.26, build Feb  9 2012 16:01:46
2020-10-31 22:45:08: INFO   blastp stderr: 
2020-10-31 22:45:09: INFO   mcl stdout: mcl 14-137
Copyright (c) 1999-2014, Stijn van Dongen. mcl comes with NO WARRANTY
to the extent permitted by law. You may redistribute copies of mcl under
the terms of the GNU General Public License.
2020-10-31 22:45:09: INFO   mcl stderr: 
2020-10-31 22:45:09: INFO   Output directory: /ds3512/home/panyp/NN1138-2/04.WGD_data/NN_data/SoyC09.CDS.out does not exist, will make it.
2020-10-31 22:45:09: INFO   CDS sequences provided, will first translate.
N/A% (0 of 55927) |                      | Elapsed Time: 0:00:00 ETA:  --:--:--
  0% (94 of 55927) |                     | Elapsed Time: 0:00:00 ETA:   0:00:59
 94% (52991 of 55927) |################# | Elapsed Time: 0:00:31 ETA:   0:00:01
 94% (53096 of 55927) |################# | Elapsed Time: 0:00:31 ETA:   0:00:01
 95% (53334 of 55927) |################# | Elapsed Time: 0:00:31 ETA:   0:00:01
 95% (53589 of 55927) |################# | Elapsed Time: 0:00:31 ETA:   0:00:01
 96% (53754 of 55927) |################# | Elapsed Time: 0:00:31 ETA:   0:00:01
 96% (53945 of 55927) |################# | Elapsed Time: 0:00:31 ETA:   0:00:00
 96% (54206 of 55927) |################# | Elapsed Time: 0:00:31 ETA:   0:00:00
 97% (54402 of 55927) |################# | Elapsed Time: 0:00:32 ETA:   0:00:00
 97% (54606 of 55927) |################# | Elapsed Time: 0:00:32 ETA:   0:00:00
 97% (54805 of 55927) |################# | Elapsed Time: 0:00:32 ETA:   0:00:00
 98% (55019 of 55927) |################# | Elapsed Time: 0:00:32 ETA:   0:00:00
 98% (55213 of 55927) |################# | Elapsed Time: 0:00:32 ETA:   0:00:00
 99% (55420 of 55927) |################# | Elapsed Time: 0:00:32 ETA:   0:00:00
 99% (55623 of 55927) |################# | Elapsed Time: 0:00:32 ETA:   0:00:00
100% (55927 of 55927) |##################| Elapsed Time: 0:00:32 Time:  0:00:32
2020-10-31 22:45:48: WARNING    There were 1 warnings during translation
2020-10-31 22:45:48: INFO   Writing blastdb sequences to db.fasta.
2020-10-31 22:45:48: INFO   Writing query sequences to query.fasta.
2020-10-31 22:45:49: INFO   Performing all-vs.-all Blastp (this might take a while)
2020-10-31 22:45:49: INFO   Making Blastdb

Building a new DB, current time: 10/31/2020 22:45:49
New DB name:   /ds3512/home/panyp/NN1138-2/04.WGD_data/NN_data/SoyC09.CDS.out/38fdb5b02beba6.db.fasta
New DB title:  /ds3512/home/panyp/NN1138-2/04.WGD_data/NN_data/SoyC09.CDS.out/38fdb5b02beba6.db.fasta
Sequence type: Protein
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1073741824B
Adding sequences from FASTA; added 55926 sequences in 2.31642 seconds.
2020-10-31 22:45:52: INFO   Running Blastp
2020-10-31 22:45:52: INFO   blastp -db /ds3512/home/panyp/NN1138-2/04.WGD_data/NN_data/SoyC09.CDS.out/38fdb5b02beba6.db.fasta -query /ds3512/home/panyp/NN1138-2/04.WGD_data/NN_data/SoyC09.CDS.out/38fdb5b09ed912.query.fasta -evalue 1e-10 -outfmt 6 -num_threads 8 -out /ds3512/home/panyp/NN1138-2/04.WGD_data/NN_data/SoyC09.CDS.out/format.SoyC09.CDS.fasta.blast.tsv
2020-11-01 03:00:51: INFO   All versus all Blastp done
2020-11-01 03:00:51: INFO   Blast done
2020-11-01 03:00:52: INFO   Performing MCL clustering (inflation factor = 2.0)
2020-11-01 03:01:05: INFO   Started MCL clustering (mcl)
2020-11-01 03:01:50: INFO   Done
2020-11-01 03:02:01: INFO   codeml stdout: AAML in paml version 4.9j, February 2020
2020-11-01 03:02:01: INFO   codeml stderr: Error: file name empty..
2020-11-01 03:02:01: INFO   codeml found
2020-11-01 03:02:01: INFO   mafft stdout: 
2020-11-01 03:02:01: INFO   mafft stderr: v7.158b (2014/06/27)
2020-11-01 03:02:02: INFO   FastTree stdout: 
2020-11-01 03:02:02: INFO   FastTree stderr: Unknown or incorrect use of option --version
  FastTree protein_alignment > tree
  FastTree < protein_alignment > tree
  FastTree -out tree protein_alignment
  FastTree -nt nucleotide_alignment > tree
  FastTree -nt -gtr < nucleotide_alignment > tree
  FastTree < nucleotide_alignment > tree
FastTree accepts alignments in fasta or phylip interleaved formats

Common options (must be before the alignment file):
  -quiet to suppress reporting information
  -nopr to suppress progress indicator
  -log logfile -- save intermediate trees, settings, and model details
  -fastest -- speed up the neighbor joining phase & reduce memory usage
        (recommended for >50,000 sequences)
  -n <number> to analyze multiple alignments (phylip format only)
        (use for global bootstrap, with seqboot and CompareToBootstrap.pl)
  -nosupport to not compute support values
  -intree newick_file to set the starting tree(s)
  -intree1 newick_file to use this starting tree for all the alignments
        (for faster global bootstrap on huge alignments)
  -pseudo to use pseudocounts (recommended for highly gapped sequences)
  -gtr -- generalized time-reversible model (nucleotide alignments only)
  -lg -- Le-Gascuel 2008 model (amino acid alignments only)
  -wag -- Whelan-And-Goldman 2001 model (amino acid alignments only)
  -quote -- allow spaces and other restricted characters (but not ' ) in
           sequence names and quote names in the output tree (fasta input only;
           FastTree will not be able to read these trees back in)
  -noml to turn off maximum-likelihood
  -nome to turn off minimum-evolution NNIs and SPRs
        (recommended if running additional ML NNIs with -intree)
  -nome -mllen with -intree to optimize branch lengths for a fixed topology
  -cat # to specify the number of rate categories of sites (default 20)
      or -nocat to use constant rates
  -gamma -- after optimizing the tree under the CAT approximation,
      rescale the lengths to optimize the Gamma20 likelihood
  -constraints constraintAlignment to constrain the topology search
       constraintAlignment should have 1s or 0s to indicates splits
  -expert -- see more options
For more information, see http://www.microbesonline.org/fasttree/
2020-11-01 03:02:02: WARNING    Output directory exists, will possibly overwrite
2020-11-01 03:02:02: INFO   Translating CDS file
N/A% (0 of 55927) |                      | Elapsed Time: 0:00:00 ETA:  --:--:--
  0% (166 of 55927) |                    | Elapsed Time: 0:00:00 ETA:   0:00:33
  0% (351 of 55927) |                    | Elapsed Time: 0:00:00 ETA:   0:00:31
  1% (580 of 55927) |                    | Elapsed Time: 0:00:00 ETA:   0:00:28
  1% (708 of 55927) |                    | Elapsed Time: 0:00:00 ETA:   0:00:28
  1% (970 of 55927) |                    | Elapsed Time: 0:00:00 ETA:   0:00:25
  2% (1235 of 55927) |                   | Elapsed Time: 0:00:00 ETA:   0:00:24
  2% (1416 of 55927) |                   | Elapsed Time: 0:00:00 ETA:   0:00:24
  2% (1668 of 55927) |                   | Elapsed Time: 0:00:00 ETA:   0:00:23
  3% (1853 of 55927) |                   | Elapsed Time: 0:00:00 ETA:   0:00:24
  3% (2014 of 55927) |                   | Elapsed Time: 0:00:00 ETA:   0:00:24
  3% (2124 of 55927) |                   | Elapsed Time: 0:00:00 ETA:   0:00:24
  4% (2322 of 55927) |                   | Elapsed Time: 0:00:01 ETA:   0:00:25
  4% (2485 of 55927) |                   | Elapsed Time: 0:00:01 ETA:   0:00:25
  4% (2648 of 55927) |                   | Elapsed Time: 0:00:01 ETA:   0:00:25
  5% (2832 of 55927) |                   | Elapsed Time: 0:00:01 ETA:   0:00:25
  5% (3023 of 55927) |#                  | Elapsed Time: 0:00:01 ETA:   0:00:25
  5% (3207 of 55927) |#                  | Elapsed Time: 0:00:01 ETA:   0:00:26
  6% (3410 of 55927) |#                  | Elapsed Time: 0:00:01 ETA:   0:00:25
  6% (3540 of 55927) |#                  | Elapsed Time: 0:00:01 ETA:   0:00:25
 95% (53553 of 55927) |################# | Elapsed Time: 0:00:27 ETA:   0:00:01
 96% (53725 of 55927) |################# | Elapsed Time: 0:00:27 ETA:   0:00:01
 96% (53926 of 55927) |################# | Elapsed Time: 0:00:27 ETA:   0:00:01
 96% (54176 of 55927) |################# | Elapsed Time: 0:00:27 ETA:   0:00:00
 97% (54356 of 55927) |################# | Elapsed Time: 0:00:27 ETA:   0:00:00
 97% (54512 of 55927) |################# | Elapsed Time: 0:00:28 ETA:   0:00:00
 97% (54673 of 55927) |################# | Elapsed Time: 0:00:28 ETA:   0:00:00
 98% (54853 of 55927) |################# | Elapsed Time: 0:00:28 ETA:   0:00:00
 98% (55032 of 55927) |################# | Elapsed Time: 0:00:28 ETA:   0:00:00
 98% (55204 of 55927) |################# | Elapsed Time: 0:00:28 ETA:   0:00:00
 99% (55395 of 55927) |################# | Elapsed Time: 0:00:28 ETA:   0:00:00
 99% (55561 of 55927) |################# | Elapsed Time: 0:00:28 ETA:   0:00:00
 99% (55814 of 55927) |################# | Elapsed Time: 0:00:28 ETA:   0:00:00
100% (55927 of 55927) |##################| Elapsed Time: 0:00:28 Time:  0:00:28
2020-11-01 03:02:31: WARNING    There were 1 warnings during translation
2020-11-01 03:02:31: INFO   Started whole paranome Ks analysis
2020-11-01 03:02:31: WARNING    Filtered out the 1 largest gene families because n*(n-1)/2 > `max_pairwise`
2020-11-01 03:02:31: WARNING    If you want to analyse these large families anyhow, please raise the `max_pairwise` parameter. 
2020-11-01 03:02:31: INFO   Started analysis in parallel (n_threads = 8)
2020-11-01 03:02:32: INFO   Performing analysis on gene family GF_000002
2020-11-01 03:02:33: INFO   Performing analysis on gene family GF_000003
2020-11-01 03:02:33: INFO   Performing analysis on gene family GF_000004
2020-11-01 03:02:34: INFO   Performing analysis on gene family GF_000005
2020-11-01 03:02:34: INFO   Performing analysis on gene family GF_000006
2020-11-01 03:02:34: INFO   Performing analysis on gene family GF_000007
2020-11-01 03:02:35: INFO   Performing analysis on gene family GF_000008
2020-11-01 03:02:35: INFO   Performing analysis on gene family GF_000009
2020-11-01 03:45:53: INFO   Performing analysis on gene family GF_000010
2020-11-01 03:49:08: INFO   Performing analysis on gene family GF_000011
2020-11-01 03:58:49: INFO   Performing analysis on gene family GF_000012
2020-11-01 04:01:43: INFO   Performing analysis on gene family GF_000013
2020-11-01 04:09:27: INFO   Performing analysis on gene family GF_000014
2020-11-01 04:12:15: INFO   Performing analysis on gene family GF_000015
2020-11-01 06:35:45: INFO   Performing analysis on gene family GF_000306
2020-11-01 06:36:03: INFO   Performing analysis on gene family GF_000307
2020-11-01 06:36:09: INFO   Performing analysis on gene family GF_000308
2020-11-01 06:36:12: INFO   Performing analysis on gene family GF_000309
2020-11-01 06:36:13: INFO   Performing analysis on gene family GF_000310
2020-11-01 06:36:16: INFO   Performing analysis on gene family GF_000311
2020-11-01 06:36:20: INFO   Performing analysis on gene family GF_000312
2020-11-01 06:36:33: INFO   Performing analysis on gene family GF_000313
2020-11-01 06:36:37: INFO   Performing analysis on gene family GF_000314
2020-11-01 06:36:52: INFO   Performing analysis on gene family GF_000315
2020-11-01 08:03:23: INFO   Performing analysis on gene family GF_011430
2020-11-01 08:03:24: INFO   Performing analysis on gene family GF_011431
2020-11-01 08:03:24: INFO   Performing analysis on gene family GF_011432
2020-11-01 08:03:24: INFO   Performing analysis on gene family GF_011433
2020-11-01 08:03:24: INFO   Performing analysis on gene family GF_011434
2020-11-01 08:03:24: INFO   Performing analysis on gene family GF_011435
2020-11-01 08:03:24: INFO   Performing analysis on gene family GF_011436
2020-11-01 08:03:25: INFO   Performing analysis on gene family GF_011437
2020-11-01 08:03:25: INFO   Performing analysis on gene family GF_011438
2020-11-01 08:03:25: INFO   Performing analysis on gene family GF_011439
2020-11-01 08:03:25: INFO   Performing analysis on gene family GF_011440
2020-11-01 08:03:25: INFO   Performing analysis on gene family GF_011441
2020-11-01 08:03:25: INFO   Performing analysis on gene family GF_011442
2020-11-01 08:03:25: INFO   Performing analysis on gene family GF_011443
2020-11-01 08:03:26: INFO   Performing analysis on gene family GF_011444
2020-11-01 08:03:26: INFO   Performing analysis on gene family GF_011445
2020-11-01 08:03:26: INFO   Performing analysis on gene family GF_011446
2020-11-01 08:03:26: INFO   Performing analysis on gene family GF_011447
2020-11-01 08:03:26: INFO   Performing analysis on gene family GF_011448
2020-11-01 08:03:26: INFO   Performing analysis on gene family GF_011449
2020-11-01 08:03:26: INFO   Performing analysis on gene family GF_011450
2020-11-01 08:03:27: INFO   Performing analysis on gene family GF_011451
2020-11-01 08:03:27: INFO   Performing analysis on gene family GF_011452
2020-11-01 08:03:28: INFO   Analysis done
2020-11-01 08:03:28: INFO   Making results data frame
2020-11-01 08:13:15: INFO   Removing tmp directory
2020-11-01 08:13:34: INFO   Computing weights, outlier cut-off at Ks > 5
2020-11-01 08:13:34: INFO   Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2020-11-01 08:13:34: INFO   NumExpr defaulting to 8 threads.
2020-11-01 08:13:39: INFO   Generating plots
2020-11-01 08:13:39: INFO   Will plot **node-weighted** histograms
2020-11-01 08:13:41: INFO   Done
2020-11-01 08:13:45: INFO   i-adhore stdout: This is i-ADHoRe v3.0.
Copyright (c) 2002-2010, Flanders Interuniversity Institute for Biotechnology, VIB.
Algorithm designed by Klaas Vandepoele, Cedric Simillion, Jan Fostier, Dieter De Witte,
Koen Janssens, Sebastian Proost, Yvan Saeys and Yves Van de Peer.

Process 1/1 is alive on compute-0-1.local.
2020-11-01 08:13:45: INFO   i-adhore stderr: Error opening the settings file: -version
2020-11-01 08:13:45: WARNING    Output directory already exists, will possibly overwrite
2020-11-01 08:13:45: INFO   Parsing GFF file
2020-11-01 08:13:48: INFO   Writing gene lists
2020-11-01 08:13:49: INFO   Writing families file
2020-11-01 08:13:51: INFO   Writing configuration file
2020-11-01 08:13:51: INFO   Running I-ADHoRe 3.0
2020-11-01 08:13:55: WARNING    WARNING: Maximum allowed number of gaps in the alignment not specified.  Setting to cluster_gap.
WARNING: Tandem gap size not correct in settings file. Using default (gap_size / 2)

2020-11-01 08:13:55: INFO   
This is i-ADHoRe v3.0.
Copyright (c) 2002-2010, Flanders Interuniversity Institute for Biotechnology, VIB.
Algorithm designed by Klaas Vandepoele, Cedric Simillion, Jan Fostier, Dieter De Witte,
Koen Janssens, Sebastian Proost, Yvan Saeys and Yves Van de Peer.

Process 1/1 is alive on compute-0-1.local.

************* i-ADHoRe parameters *************
    Number of genelists = 54
    Blast table = ./wgd_syn/families.tsv
    Output path = ./wgd_syn/i-adhore-out/
    Gap size = 30
    Cluster gap size = 35
    Cloud gap size = 0
    Cloud cluster gap size = 0
    Max gaps in alignment = 35
    Tandem gap = 15
    Flush output = 1000
    Q-value = 0.75
    Anchorpoints = 3
    Probability cutoff = 0.01
    Cloud filtering method = Binomial
    Level 2 only = false
    Use family = true
    Write statistics = false
    Alignment method = GreedyGraphbased4
    Multiple hypothesis correction = FDR
    Number of threads = 1
    Compare aligners = false
    Collinear searches only
    Visualize GHM.png = false
    Visualize Alignment = true
    Verbose output = true
************ END i-AdDHoRe parameters *********

Creating dataset...         done. (time: 0.0401988s)
Mapping gene families...        done. (time: 0.424732s)
Remapping tandem duplicates...  done. (time: 0.050889s)
Writing genelists file...       done. (time: 0.0950758s)
Collinear Search
Level 2 multiplicon detection...    done. (time: 3.18828s)
Profile detection...
Flushing output files...Visualize AlignedProfiles
done.
Time for Higher Level Detection: 0.00403285s.

All Done!  Bye...

2020-11-01 08:13:55: INFO   Drawing co-linearity dotplot
2020-11-01 08:13:55: INFO   Done
/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/wgd-1.2-py3.6.egg/wgd/viz.py:223: UserWarning: Attempting to set identical left == right == 0 results in singular transformations; automatically expanding.
/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/wgd-1.2-py3.6.egg/wgd/viz.py:224: UserWarning: Attempting to set identical bottom == top == 0 results in singular transformations; automatically expanding.
2020-11-01 08:14:02: INFO   Preparing data frame
2020-11-01 08:14:03: INFO    .. max_iter = 1000
2020-11-01 08:14:03: INFO    .. n_init   = 1
2020-11-01 08:14:03: INFO   Method is GMM, interpret best model with caution!
2020-11-01 08:14:03: INFO   Fitting GMM with 1 components
2020-11-01 08:14:04: INFO   Component mean, variance, weight: 
2020-11-01 08:14:04: INFO   .. 0.283, 1.347, 1.000
2020-11-01 08:14:04: INFO   Fitting GMM with 2 components
2020-11-01 08:14:04: INFO   Component mean, variance, weight: 
2020-11-01 08:14:04: INFO   .. 0.915, 0.486, 0.390
2020-11-01 08:14:04: INFO   .. 0.134, 0.456, 0.610
2020-11-01 08:14:04: INFO   Fitting GMM with 3 components
2020-11-01 08:14:04: INFO   Component mean, variance, weight: 
2020-11-01 08:14:04: INFO   .. 0.656, 0.142, 0.231
2020-11-01 08:14:04: INFO   .. 0.136, 0.458, 0.631
2020-11-01 08:14:04: INFO   .. 1.948, 0.079, 0.138
2020-11-01 08:14:04: INFO   Fitting GMM with 4 components
2020-11-01 08:14:04: INFO   Component mean, variance, weight: 
2020-11-01 08:14:04: INFO   .. 0.130, 0.165, 0.467
2020-11-01 08:14:04: INFO   .. 1.874, 0.098, 0.147
2020-11-01 08:14:04: INFO   .. 0.065, 0.947, 0.079
2020-11-01 08:14:04: INFO   .. 0.550, 0.206, 0.306
2020-11-01 08:14:04: INFO   
2020-11-01 08:14:04: INFO   AIC assessment:
2020-11-01 08:14:04: INFO   min(AIC) = 97487.41 for model 4
2020-11-01 08:14:04: INFO   Relative probabilities compared to model 4:
2020-11-01 08:14:04: INFO      /                          \
2020-11-01 08:14:04: INFO      |      (min(AIC) - AICi)/2 |
2020-11-01 08:14:04: INFO      | p = e                    |
2020-11-01 08:14:04: INFO      \                          /
2020-11-01 08:14:04: INFO   .. model   1: p = 0.0000
2020-11-01 08:14:04: INFO   .. model   2: p = 0.0000
2020-11-01 08:14:04: INFO   .. model   3: p = 0.0000
2020-11-01 08:14:04: INFO   .. model   4: p = 1.0000
2020-11-01 08:14:04: INFO   
2020-11-01 08:14:04: INFO   
2020-11-01 08:14:04: INFO   Delta BIC assessment: 
2020-11-01 08:14:04: INFO   min(BIC) = 97580.00 for model 4
2020-11-01 08:14:04: INFO   .. model   1: delta(BIC) =  7250.57 (    >10: Very Strong)
2020-11-01 08:14:04: INFO   .. model   2: delta(BIC) =  4139.71 (    >10: Very Strong)
2020-11-01 08:14:04: INFO   .. model   3: delta(BIC) =  2174.50 (    >10: Very Strong)
2020-11-01 08:14:04: INFO   .. model   4: delta(BIC) =     0.00 (0 to  2:   Very weak)
2020-11-01 08:14:04: INFO   
2020-11-01 08:14:04: INFO   Plotting AIC & BIC
2020-11-01 08:14:04: INFO   Plotting mixtures
2020-11-01 08:14:07: INFO   Writing component-wise probabilities to file
arzwa commented 3 years ago

Hi, that is strange, so the .tsv files for the Ks distribution and anchor pair Ks distribution are non-empty, but the figures are? Do you get a plot when using

wgd viz -ks wgd_ksd/format.SoyC09.CDS.fasta.ks.tsv

?

shiyi-pan commented 3 years ago

thank you for your reply. the format.SoyC09.CDS.fasta.ks.tsv is 40Mb and part of file is as follows:

        AlignmentCoverage       AlignmentIdentity       AlignmentLength Distance        Family  Ka      Ks      Node    Omega   PairwiseAlignmentLength Paralog1        Paralog2        WeightOutliersExcluded  WeightOutliersIncluded
SoyC09_02G004800__SoyC09_10G003700      0.95349 0.94146 1290.0  0.08526 GF_006783       0.0419  0.1203  2.0     0.3483  1230.0  SoyC09_02G004800        SoyC09_10G003700        1.0     1.0
SoyC09_02G294600__SoyC09_08G319200      0.93605 0.81884 1032.0  0.28999 GF_002388       0.1261  0.5694  8.0     0.2214  966.0   SoyC09_02G294600        SoyC09_08G319200        0.16667 0.16667
SoyC09_02G294600__SoyC09_14G011500      0.97965 0.96835 1032.0  0.03387 GF_002388       0.0148  0.0906  6.0     0.1638  1011.0  SoyC09_02G294600        SoyC09_14G011500        1.0     1.0
SoyC09_02G294600__SoyC09_16G191200      0.84012 0.90542 1032.0  0.17804 GF_002388       0.0781  0.1935  7.0     0.4036  867.0   SoyC09_02G294600        SoyC09_16G191200        0.5     0.5
SoyC09_02G294600__SoyC09_18G072700      0.93314 0.82139 1032.0  0.28693 GF_002388       0.1259  0.5511  8.0     0.2285  963.0   SoyC09_02G294600        SoyC09_18G072700        0.16667 0.16667
SoyC09_08G319200__SoyC09_14G011500      0.94186 0.82099 1032.0  0.2691  GF_002388       0.1225  0.5687  8.0     0.2154  972.0   SoyC09_08G319200        SoyC09_14G011500        0.16667 0.16667
SoyC09_08G319200__SoyC09_16G191200      0.82267 0.8033  1032.0  0.34811 GF_002388       0.1412  0.6317  8.0     0.2235  849.0   SoyC09_08G319200        SoyC09_16G191200        0.16667 0.16667
SoyC09_08G319200__SoyC09_18G072700      0.95349 0.96138 1032.0  0.04714 GF_002388       0.0251  0.0868  5.0     0.2894  984.0   SoyC09_08G319200        SoyC09_18G072700        1.0     1.0
SoyC09_14G011500__SoyC09_16G191200      0.84302 0.91494 1032.0  0.15715 GF_002388       0.0685  0.1825  7.0     0.3755  870.0   SoyC09_14G011500        SoyC09_16G191200        0.5     0.5
SoyC09_14G011500__SoyC09_18G072700      0.93895 0.8225  1032.0  0.26604 GF_002388       0.1223  0.5586  8.0     0.2189  969.0   SoyC09_14G011500        SoyC09_18G072700        0.16667 0.16667
SoyC09_16G191200__SoyC09_18G072700      0.82267 0.80565 1032.0  0.34505 GF_002388       0.139   0.633   8.0     0.2195  849.0   SoyC09_16G191200        SoyC09_18G072700        0.16667 0.16667
SoyC09_06G212400__SoyC09_07G113300      0.33538 0.75535 1950.0  0.37721 GF_001512       0.1922  0.9434  12.0    0.2037  654.0   SoyC09_06G212400        SoyC09_07G113300        0.08333 0.08333
SoyC09_06G212400__SoyC09_07G155900      0.16    0.74038 1950.0  0.36079 GF_001512       0.2306  1.196   12.0    0.1928  312.0   SoyC09_06G212400        SoyC09_07G155900        0.08333 0.08333
SoyC09_06G212400__SoyC09_12G136800      0.26308 0.7115  1950.0  0.49301 GF_001512       0.2737  0.9403  12.0    0.291   513.0   SoyC09_06G212400        SoyC09_12G136800        0.08333 0.08333
SoyC09_06G212400__SoyC09_12G150100      0.70154 0.94737 1950.0  0.05783 GF_001512       0.0271  0.1412  10.0    0.1917  1368.0  SoyC09_06G212400        SoyC09_12G150100        1.0     1.0
SoyC09_06G212400__SoyC09_12G223100      0.70154 0.7902  1950.0  0.27494 GF_001512       0.1303  0.8582  12.0    0.1518  1368.0  SoyC09_06G212400        SoyC09_12G223100        0.08333 0.08333
SoyC09_06G212400__SoyC09_13G245700      0.70154 0.80263 1950.0  0.26643 GF_001512       0.1307  0.7062  11.0    0.185   1368.0  SoyC09_06G212400        SoyC09_13G245700        0.5     0.5
SoyC09_07G113300__SoyC09_07G155900      0.15846 0.94822 1950.0  0.09738 GF_001512       0.0446  0.079   7.0     0.5652  309.0   SoyC09_07G113300        SoyC09_07G155900        1.0     1.0
SoyC09_07G113300__SoyC09_12G136800      0.24462 0.87421 1950.0  0.23508 GF_001512       0.1435  0.1424  8.0     1.0076  477.0   SoyC09_07G113300        SoyC09_12G136800        0.5     0.5

the the figure got from your command four histogram with X-axis as Ks,logKs,logKa and logW . The figure is just grey bar and don't have any others like lines,I don't know is normal or not.

arzwa commented 3 years ago

Strange, for me it works in a fresh environment with the latest version of wgd installed from scratch, also for the fragment you pasted above. It must have something to do with your installation. I recommend using virtualenv to install wgd in a separate environment. In brief what I have done to test this:

$ virtualenv venv -p python3
$ source venv/bin/activate
$ git clone https://github.com/arzwa/wgd.git
$ pip install ./wgd
$ wgd viz -ks shiyi-pan.tsv

If you encounter issues in the last step, maybe try instead

$ python3 ./wgd/wgd_cli.py viz -ks shiyi-pan.tsv

(where shiyi-pan.tsv is the fragment you pasted above [note that the file should be tab separated not whitespace separated]). This will create a fresh environment, install the latest wgd version and run wgd viz. If I do this, I get a file wgd_hist.svg with the histogram (note that for the fragment above this is only a couple of bars, but for the full file it should be a nice histogram).

shiyi-pan commented 3 years ago

thank you for your help. I'm sorry to reply lately. I will try your suggestion. In my log about FastTree, is it normal?

2020-11-01 03:02:02: INFO FastTree stdout: 2020-11-01 03:02:02: INFO FastTree stderr: Unknown or incorrect use of option --version FastTree protein_alignment > tree FastTree < protein_alignment > tree FastTree -out tree protein_alignment FastTree -nt nucleotide_alignment > tree FastTree -nt -gtr < nucleotide_alignment > tree FastTree < nucleotide_alignment > tree FastTree accepts alignments in fasta or phylip interleaved formats

Common options (must be before the alignment file): -quiet to suppress reporting information -nopr to suppress progress indicator -log logfile -- save intermediate trees, settings, and model details -fastest -- speed up the neighbor joining phase & reduce memory usage (recommended for >50,000 sequences) -n to analyze multiple alignments (phylip format only) (use for global bootstrap, with seqboot and CompareToBootstrap.pl) -nosupport to not compute support values -intree newick_file to set the starting tree(s) -intree1 newick_file to use this starting tree for all the alignments (for faster global bootstrap on huge alignments) -pseudo to use pseudocounts (recommended for highly gapped sequences) -gtr -- generalized time-reversible model (nucleotide alignments only) -lg -- Le-Gascuel 2008 model (amino acid alignments only) -wag -- Whelan-And-Goldman 2001 model (amino acid alignments only) -quote -- allow spaces and other restricted characters (but not ' ) in sequence names and quote names in the output tree (fasta input only; FastTree will not be able to read these trees back in) -noml to turn off maximum-likelihood -nome to turn off minimum-evolution NNIs and SPRs (recommended if running additional ML NNIs with -intree) -nome -mllen with -intree to optimize branch lengths for a fixed topology -cat # to specify the number of rate categories of sites (default 20) or -nocat to use constant rates -gamma -- after optimizing the tree under the CAT approximation, rescale the lengths to optimize the Gamma20 likelihood -constraints constraintAlignment to constrain the topology search constraintAlignment should have 1s or 0s to indicates splits -expert -- see more options For more information, see http://www.microbesonline.org/fasttree/

arzwa commented 3 years ago

Yes, this is just the result of a check performed by wgd of whether it can run the FastTree executable, maybe I should hide that because it is confusing indeed.

shiyi-pan commented 3 years ago

Hi, here is my data , could you take a look at and help me to find out what the problem is ? thank you very much.

gff.zip

cds.zip