arzwa / wgd

Python package and CLI for whole-genome duplication related analyses. This package is deprecated in favor of https://github.com/heche-psb/wgd.
http://wgd.readthedocs.io/en/latest/
GNU General Public License v3.0
81 stars 41 forks source link

WGD is not responding on second stage #15

Closed amit4mchiba closed 5 years ago

amit4mchiba commented 5 years ago

Hi,

I am writing here to seek your advice on using wgd for my study.

Based on the paper and manual, I first got my genome_cds.fasta, and then used this command-

wgd mcl --cds --mcl -s genome_cds.fasta -o ./ -n 8

This resulted in genome_cds.mcl file as output. I used this output file to ruin next step as follows-

wgd ksd -o ./ -n32 --pairwise --wm phyml ../genome_cds.mcl ../genome_cds.fa

this is the part of running process-

(py3) amit8chiba@amit8chiba-Precision-Tower-7910:/mnt/md0/genome_cds_final/Comparitive_genomics/wgd/run_results/genome_cds_analysis$ wgd ksd -o ./ -n32 --pairwise --wm phyml ../genome_cds.mcl ../genome_cds.fa
2019-03-02 02:37:05: INFO
2019-03-02 02:37:05: INFO       codeml found
2019-03-02 02:37:05: INFO       MUSCLE v3.8.1551 by Robert C. Edgar
2019-03-02 02:37:05: INFO       . Command line: phyml --version

. This is PhyML version 3.3.20180621.
2019-03-02 02:37:05: WARNING    Output directory exists, will possibly overwrite
2019-03-02 02:37:06: INFO       Translating CDS file
100% (32390 of 32390) |##################################################################################################################################| Elapsed Time: 0:00:07 Time:  0:00:07
2019-03-02 02:37:14: WARNING    There were 0 warnings during translation
2019-03-02 02:37:14: INFO       Started whole paranome Ks analysis
2019-03-02 02:37:31: WARNING    Filtered out the 9 largest gene families because n*(n-1)/2 > `max_pairwise`
2019-03-02 02:37:31: WARNING    If you want to analyse these large families anyhow, please raise the `max_pairwise` parameter.
2019-03-02 02:37:31: INFO       Started analysis in parallel (n_threads = 32)
2019-03-02 02:37:31: INFO       Performing analysis on gene family GF_000010
2019-03-02 02:37:31: INFO       Performing analysis on gene family GF_000011
2019-03-02 02:37:31: INFO       Performing analysis on gene family GF_000012
2019-03-02 02:37:32: INFO       Performing analysis on gene family GF_000013
2019-03-02 02:37:32: INFO       Performing analysis on gene family GF_000014
2019-03-02 02:37:32: INFO       Performing analysis on gene family GF_000015
2019-03-02 02:37:32: INFO       Performing analysis on gene family GF_000016
2019-03-02 02:37:32: INFO       Performing analysis on gene family GF_000017
2019-03-02 02:37:32: INFO       Performing analysis on gene family GF_000018
2019-03-02 02:37:32: INFO       Performing analysis on gene family GF_000019
2019-03-02 02:37:32: INFO       Performing analysis on gene family GF_000020
2019-03-02 02:37:33: INFO       Performing analysis on gene family GF_000022
2019-03-02 02:37:33: INFO       Performing analysis on gene family GF_000021
2019-03-02 02:37:33: INFO       Performing analysis on gene family GF_000023
2019-03-02 02:37:33: INFO       Performing analysis on gene family GF_000024
2019-03-02 02:37:33: INFO       Performing analysis on gene family GF_000025
2019-03-02 02:37:33: INFO       Performing analysis on gene family GF_000026
2019-03-02 02:37:33: INFO       Performing analysis on gene family GF_000027
2019-03-02 02:37:33: INFO       Performing analysis on gene family GF_000028
2019-03-02 02:37:34: INFO       Performing analysis on gene family GF_000029
2019-03-02 02:37:34: INFO       Performing analysis on gene family GF_000030
2019-03-02 02:37:34: INFO       Performing analysis on gene family GF_000031
2019-03-02 02:37:34: INFO       Performing analysis on gene family GF_000032
2019-03-02 02:37:34: INFO       Performing analysis on gene family GF_000033
2019-03-02 02:37:34: INFO       Performing analysis on gene family GF_000034
2019-03-02 02:37:34: INFO       Performing analysis on gene family GF_000035
2019-03-02 02:37:34: INFO       Performing analysis on gene family GF_000036
2019-03-02 02:37:35: INFO       Performing analysis on gene family GF_000038
2019-03-02 02:37:35: INFO       Performing analysis on gene family GF_000037
2019-03-02 02:37:35: INFO       Performing analysis on gene family GF_000039
2019-03-02 02:37:35: INFO       Performing analysis on gene family GF_000041
2019-03-02 02:37:35: INFO       Performing analysis on gene family GF_000042

This step resulted in several files in temp file but it has been almost 12 hours but output file has not generated. It seems the program is stuck as no new files are being generated but I can not see any error. I am wondering if this is expected time. My genome size is 400Mb, and got 35000 genes in it.

Please let me know if you need any further information in order to help me out here.

Thank you so much in advance,

with best regards Amit

amit4mchiba commented 5 years ago

By the way, these are the all files that were resulted during running second step but it remains the same for last 10 hours with no new files or summary or anything else. I am wondring if I need to use .ks file and create ks.tsv for next stage. Please advice me.

amit8chiba@amit8chiba-Precision-Tower-7910:/mnt/md0/Opu_r1.2_final/Comparitive_genomics/wgd/run_results/Opu_Ksd_analysis/ks_tmp.371e05d6b24ca6$ ls
GF_000010.fasta                            GF_000522.fasta  GF_001046.fasta  GF_001570.fasta         GF_002091.Ks            GF_002607.Ks            GF_003127.fasta      GF_003651.fasta
GF_000010.Ks                               GF_000522.Ks     GF_001046.Ks     GF_001570.Ks            GF_002092.fasta         GF_002608.fasta         GF_003127.Ks         GF_003651.Ks
GF_000011.fasta                            GF_000523.fasta  GF_001047.fasta  GF_001571.fasta         GF_002092.Ks            GF_002608.Ks            GF_003128.fasta      GF_003652.fasta
GF_000011.Ks                               GF_000523.Ks     GF_001047.Ks     GF_001571.Ks            GF_002093.fasta         GF_002609.fasta         GF_003128.Ks         GF_003652.Ks
GF_000012.fasta                            GF_000524.fasta  GF_001048.fasta  GF_001572.fasta         GF_002093.Ks            GF_002609.Ks            GF_003129.fasta      GF_003653.fasta
GF_000012.Ks                               GF_000524.Ks     GF_001048.Ks     GF_001572.Ks            GF_002094.fasta         GF_002610.fasta         GF_003129.Ks         GF_003653.Ks
GF_000013.fasta                            GF_000525.fasta  GF_001049.fasta  GF_001573.fasta         GF_002094.Ks            GF_002610.Ks            GF_003130.fasta      GF_003654.fasta
GF_000013.fasta.msa                        GF_000525.Ks     GF_001049.Ks     GF_001573.Ks            GF_002095.fasta         GF_002611.fasta         GF_003130.Ks         GF_003654.Ks
GF_000013.fasta.msa.phyml                  GF_000526.fasta  GF_001050.fasta  GF_001574.fasta         GF_002095.Ks            GF_002611.Ks            GF_003131.fasta      GF_003655.fasta
GF_000013.fasta.msa.phyml_phyml_stats.txt  GF_000526.Ks     GF_001050.Ks     GF_001574.Ks            GF_002096.fasta         GF_002612.fasta         GF_003131.Ks         GF_003655.Ks
GF_000013.fasta.msa.phyml_phyml_tree.txt   GF_000527.fasta  GF_001051.fasta  GF_001575.fasta         GF_002096.Ks            GF_002612.Ks            GF_003132.fasta      GF_003656.fasta
GF_000014.fasta                            GF_000527.Ks     GF_001051.Ks     GF_001575.Ks            GF_002097.fasta         GF_002613.fasta         GF_003132.Ks         GF_003656.Ks
GF_000014.fasta.msa                        GF_000528.fasta  GF_001052.fasta  GF_001576.fasta         GF_002097.Ks            GF_002613.Ks            GF_003133.fasta      GF_003657.fasta
GF_000014.fasta.msa.phyml                  GF_000528.Ks     GF_001052.Ks     GF_001576.Ks            GF_002098.fasta         GF_002614.fasta         GF_003133.Ks         GF_003657.Ks
GF_000014.fasta.msa.phyml_phyml_stats.txt  GF_000529.fasta  GF_001053.fasta  GF_001577.fasta         GF_002098.Ks            GF_002614.Ks            GF_003134.fasta      GF_003658.fasta
GF_000014.fasta.msa.phyml_phyml_tree.txt   GF_000529.Ks     GF_001053.Ks     GF_001577.Ks            GF_002099.fasta         GF_002615.fasta         GF_003134.Ks         GF_003658.Ks
GF_000015.fasta                            GF_000530.fasta  GF_001054.fasta  GF_001578.fasta         GF_002099.Ks            GF_002615.Ks            GF_003135.fasta      GF_003659.fasta
GF_000015.Ks                               GF_000530.Ks     GF_001054.Ks     GF_001578.Ks            GF_002100.fasta         GF_002616.fasta         GF_003135.Ks         GF_003659.Ks
GF_000016.fasta                            GF_000531.fasta  GF_001055.fasta  GF_001579.fasta         GF_002100.Ks            GF_002616.Ks            GF_003136.fasta      GF_003660.fasta
GF_000016.Ks                               GF_000531.Ks     GF_001055.Ks     GF_001579.Ks            GF_002101.fasta         GF_002617.fasta         GF_003136.Ks         GF_003660.Ks
GF_000017.fasta                            GF_000532.fasta  GF_001056.fasta  GF_001580.fasta         GF_002101.Ks            GF_002617.Ks            GF_003137.fasta      GF_003661.fasta
GF_000017.fasta.msa                        GF_000532.Ks     GF_001056.Ks     GF_001580.Ks            GF_002102.fasta         GF_002618.fasta         GF_003137.Ks         GF_003661.Ks
GF_000017.fasta.msa.phyml                  GF_000533.fasta  GF_001057.fasta  GF_001581.fasta         GF_002102.Ks            GF_002618.Ks            GF_003138.fasta      GF_003662.fasta
GF_000017.fasta.msa.phyml_phyml_stats.txt  GF_000533.Ks     GF_001057.Ks     GF_001581.Ks            GF_002103.fasta         GF_002619.fasta         GF_003138.Ks         GF_003662.Ks
GF_000017.fasta.msa.phyml_phyml_tree.txt   GF_000534.fasta  GF_001058.fasta  GF_001582.fasta         GF_002103.Ks            GF_002619.Ks            GF_003139.fasta      GF_003663.fasta
GF_000018.fasta                            GF_000534.Ks     GF_001058.Ks     GF_001582.Ks            GF_002104.fasta         GF_002620.fasta         GF_003139.Ks         GF_003663.Ks
GF_000018.fasta.msa                        GF_000535.fasta  GF_001059.fasta  GF_001583.fasta         GF_002104.Ks            GF_002620.Ks            GF_003140.fasta      GF_003664.fasta
GF_000018.fasta.msa.phyml                  GF_000535.Ks     GF_001059.Ks     GF_001583.Ks            GF_002105.fasta         GF_002621.fasta         GF_003140.Ks         GF_003664.Ks
GF_000018.fasta.msa.phyml_phyml_stats.txt  GF_000536.fasta  GF_001060.fasta  GF_001584.fasta         GF_002105.Ks            GF_002621.Ks            GF_003141.fasta      GF_003665.fasta
GF_000018.fasta.msa.phyml_phyml_tree.txt   GF_000536.Ks     GF_001060.Ks     GF_001584.Ks            GF_002106.fasta         GF_002622.fasta         GF_003141.Ks         GF_003665.Ks
GF_000019.fasta                            GF_000537.fasta  GF_001061.fasta  GF_001585.fasta         GF_002106.fasta.msa     GF_002622.Ks            GF_003142.fasta      GF_003666.fasta
GF_000019.fasta.msa                        GF_000537.Ks     GF_001061.Ks     GF_001585.Ks            GF_002106.fasta.msa.nw  GF_002623.fasta         GF_003142.Ks         GF_003666.Ks
GF_000019.fasta.msa.phyml                  GF_000538.fasta  GF_001062.fasta  GF_001586.fasta         GF_002107.fasta         GF_002623.fasta.msa     GF_003143.fasta      GF_003667.fasta
GF_000019.fasta.msa.phyml_phyml_stats.txt  GF_000538.Ks     GF_001062.Ks     GF_001586.Ks            GF_002107.Ks            GF_002623.fasta.msa.nw  GF_003143.Ks         GF_003667.Ks
GF_000019.fasta.msa.phyml_phyml_tree.txt   GF_000539.fasta  GF_001063.fasta  GF_001587.fasta         GF_002108.fasta         GF_002624.fasta         GF_003144.fasta      GF_003668.fasta
GF_000020.fasta                            GF_000539.Ks     GF_001063.Ks     GF_001587.Ks            GF_002108.Ks            GF_002624.Ks            GF_003144.Ks         GF_003668.Ks
GF_000020.fasta.msa                        GF_000540.fasta  GF_001064.fasta  GF_001588.fasta         GF_002109.fasta         GF_002625.fasta         GF_003145.fasta      GF_003669.fasta
GF_000020.fasta.msa.phyml                  GF_000540.Ks     GF_001064.Ks     GF_001588.Ks            GF_002109.Ks            GF_002625.Ks            GF_003145.Ks         GF_003669.Ks
GF_000020.fasta.msa.phyml_phyml_stats.txt  GF_000541.fasta  GF_001065.fasta  GF_001589.fasta         GF_002110.fasta         GF_002626.fasta         GF_003146.fasta      GF_003670.fasta
GF_000020.fasta.msa.phyml_phyml_tree.txt   GF_000541.Ks     GF_001065.Ks     GF_001589.Ks            GF_002110.Ks            GF_002626.Ks            GF_003146.Ks         GF_003670.Ks
GF_000021.fasta                            GF_000542.fasta  GF_001066.fasta  GF_001590.fasta         GF_002111.fasta         GF_002627.fasta         GF_003147.fasta      GF_003671.fasta
GF_000021.fasta.msa                        GF_000542.Ks     GF_001066.Ks     GF_001590.Ks            GF_002111.Ks            GF_002627.fasta.msa     GF_003147.Ks         GF_003671.Ks
GF_000021.fasta.msa.phyml                  GF_000543.fasta  GF_001067.fasta  GF_001591.fasta         GF_002112.fasta         GF_002627.fasta.msa.nw  GF_003148.fasta      GF_003672.fasta
GF_000021.fasta.msa.phyml_phyml_stats.txt  GF_000543.Ks     GF_001067.Ks     GF_001591.Ks            GF_002112.Ks            GF_002628.fasta         GF_003148.Ks         GF_003672.Ks
GF_000021.fasta.msa.phyml_phyml_tree.txt   GF_000544.fasta  GF_001068.fasta  GF_001592.fasta         GF_002113.fasta         GF_002628.Ks            GF_003149.fasta      GF_003673.fasta
GF_000022.fasta                            GF_000544.Ks     GF_001068.Ks     GF_001592.Ks            GF_002113.Ks            GF_002629.fasta         GF_003149.Ks         GF_003673.Ks
GF_000022.Ks                               GF_000545.fasta  GF_001069.fasta  GF_001593.fasta         GF_002114.fasta         GF_002629.Ks            GF_003150.fasta      GF_003674.fasta
GF_000023.fasta                            GF_000545.Ks     GF_001069.Ks     GF_001593.Ks            GF_002114.Ks            GF_002630.fasta         GF_003150.Ks         GF_003674.Ks
GF_000023.Ks                               GF_000546.fasta  GF_001070.fasta  GF_001594.fasta         GF_002115.fasta         GF_002630.Ks            GF_003151.fasta      GF_003675.fasta
GF_000024.fasta                            GF_000546.Ks     GF_001070.Ks     GF_001594.Ks            GF_002115.Ks            GF_002631.fasta         GF_003151.Ks         GF_003675.Ks
GF_000024.Ks                               GF_000547.fasta  GF_001071.fasta  GF_001595.fasta         GF_002116.fasta         GF_002631.Ks            GF_003152.fasta      GF_003676.fasta
GF_000025.fasta                            GF_000547.Ks     GF_001071.Ks     GF_001595.Ks            GF_002116.Ks            GF_002632.fasta         GF_003152.Ks         GF_003676.Ks
GF_000025.fasta.msa                        GF_000548.fasta  GF_001072.fasta  GF_001596.fasta         GF_002117.fasta         GF_002632.Ks            GF_003153.fasta      GF_003677.fasta
GF_000025.fasta.msa.phyml                  GF_000548.Ks     GF_001072.Ks     GF_001596.Ks            GF_002117.Ks            GF_002633.fasta         GF_003153.Ks         GF_003677.Ks
GF_000025.fasta.msa.phyml_phyml_stats.txt  GF_000549.fasta  GF_001073.fasta  GF_001597.fasta         GF_002118.fasta         GF_002633.Ks            GF_003154.fasta      GF_003678.fasta
GF_000025.fasta.msa.phyml_phyml_tree.txt   GF_000549.Ks     GF_001073.Ks     GF_001597.Ks            GF_002118.Ks            GF_002634.fasta         GF_003154.Ks         GF_003678.Ks
GF_000026.fasta                            GF_000550.fasta  GF_001074.fasta  GF_001598.fasta         GF_002119.fasta         GF_002634.Ks            GF_003155.fasta      GF_003679.fasta
GF_000026.Ks                               GF_000550.Ks     GF_001074.Ks     GF_001598.Ks            GF_002119.Ks            GF_002635.fasta         GF_003155.Ks         GF_003679.Ks
GF_000027.fasta                            GF_000551.fasta  GF_001075.fasta  GF_001599.fasta         GF_002120.fasta         GF_002635.Ks            GF_003156.fasta      GF_003680.fasta
GF_000027.Ks                               GF_000551.Ks     GF_001075.Ks     GF_001599.Ks            GF_002120.Ks            GF_002636.fasta         GF_003156.Ks         GF_003680.Ks
GF_000028.fasta                            GF_000552.fasta  GF_001076.fasta  GF_001600.fasta         GF_002121.fasta         GF_002636.Ks            GF_003157.fasta      GF_003681.fasta
GF_000028.Ks                               GF_000552.Ks     GF_001076.Ks     GF_001600.Ks            GF_002121.Ks            GF_002637.fasta         GF_003157.Ks         GF_003681.Ks
GF_000029.fasta                            GF_000553.fasta  GF_001077.fasta  GF_001601.fasta         GF_002122.fasta         GF_002637.Ks            GF_003158.fasta      GF_003682.fasta
GF_000029.Ks                               GF_000553.Ks     GF_001077.Ks     GF_001601.Ks            GF_002122.Ks            GF_002638.fasta         GF_003158.Ks         GF_003682.Ks
GF_000030.fasta                            GF_000554.fasta  GF_001078.fasta  GF_001602.fasta         GF_002123.fasta         GF_002638.Ks            GF_003159.fasta      GF_003683.fasta
GF_000030.Ks                               GF_000554.Ks     GF_001078.Ks     GF_001602.Ks            GF_002123.Ks            GF_002639.fasta         GF_003159.Ks         GF_003683.Ks
GF_000031.fasta                            GF_000555.fasta  GF_001079.fasta  GF_001603.fasta         GF_002124.fasta         GF_002639.Ks            GF_003160.fasta      GF_003684.fasta
GF_000031.Ks                               GF_000555.Ks     GF_001079.Ks     GF_001603.Ks            GF_002124.Ks            GF_002640.fasta         GF_003160.Ks         GF_003684.Ks
GF_000032.fasta                            GF_000556.fasta  GF_001080.fasta  GF_001604.fasta         GF_002125.fasta         GF_002640.Ks            GF_003161.fasta      GF_003685.fasta
GF_000032.Ks                               GF_000556.Ks     GF_001080.Ks     GF_001604.Ks            GF_002125.Ks            GF_002641.fasta         GF_003161.Ks         GF_003685.Ks
GF_000033.fasta                            GF_000557.fasta  GF_001081.fasta  GF_001605.fasta         
arzwa commented 5 years ago

Hi, as you might have realized, the gene families for which there is a .Ks file are successfully analyzed. For the large families the analysis seems not to have finished, this might just be because of their size (tree inference and codeml taking a very long time).

I would advise you to run the analysis with fasttree (default) for tree inference. The trees are only used for weighting the Ks values, and a perfectly accurate tree is therefore not really necessary, FastTree will do a good job for this purpose. Also it is not really useful to use the --pairwise flag (it is slower and not better).

So my advise would be to just use the following command

wgd ksd -o ./ -n32 ../genome_cds.mcl ../genome_cds.fa

If you want to re-use the results you already obtained, you can point wgd to the tmp directory using the -tmp option.

Hope that helps, please let me know if it does.

amit4mchiba commented 5 years ago

Thank you so much for your advice. It worked but now I am stuck at the next stage. I wanted to run wgd mix command and used output from wgd ksd.

Here is what I did-

(py3) amit8chiba@amit8chiba-Precision-Tower-7910:$ wgd mix genome_cds.fa.ks.tsv -n 1 5
2019-03-04 02:09:26: INFO       Preparing data frame
Traceback (most recent call last):
  File "/home/amit8chiba/miniconda2/envs/py3/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 2656, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'AlignmentCoverage'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/amit8chiba/miniconda2/envs/py3/bin/wgd", line 11, in <module>
    sys.exit(cli())
  File "/home/amit8chiba/miniconda2/envs/py3/lib/python3.5/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/amit8chiba/miniconda2/envs/py3/lib/python3.5/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/amit8chiba/miniconda2/envs/py3/lib/python3.5/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/amit8chiba/miniconda2/envs/py3/lib/python3.5/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/amit8chiba/miniconda2/envs/py3/lib/python3.5/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/amit8chiba/miniconda2/envs/py3/lib/python3.5/site-packages/wgd_cli.py", line 1018, in mix
    output_dir, gamma, n_init, max_iter
  File "/home/amit8chiba/miniconda2/envs/py3/lib/python3.5/site-packages/wgd_cli.py", line 1060, in mix_
    ks_range[0], ks_range[1])
  File "/home/amit8chiba/miniconda2/envs/py3/lib/python3.5/site-packages/wgd/modeling.py", line 56, in filter_group_data
    df = df[df["AlignmentCoverage"] >= aln_cov]
  File "/home/amit8chiba/miniconda2/envs/py3/lib/python3.5/site-packages/pandas/core/frame.py", line 2927, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/amit8chiba/miniconda2/envs/py3/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 2658, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'AlignmentCoverage'

I am not sure what happened here. I checked and I think I have all dependencies. I also checked and seems I have panda installed (version-0.24.1).

Please advice me. I think I am almost there but still stuck.

I have another problem. I wanted to also run collinearity based analysis, and for which I run command as said in the mannual. There were no error and I got many files and results. But the dot plot is empty and it seems it did not identify any collineraity. This is little strange since I can get results using MCScanx. For gff file, I checked Arabidopsis example gff file and mine is exactly the same format. So, I am not sure how shall i proceed there.

thank you so much in advance.

arzwa commented 5 years ago

Hi,

I am not sure why the mix command is failing, it should definitely not give an error like that unless you Ks distribution file is incorrectly formatted or empty... Could you send the first 20 lines or so from genome_cds.fa.ks.tsv?

For the co-linearity analysis, the issue could be that the --gene_attribute and --feature options are not correctly set. These specify where to look in the GFF for the genes and their names. So for example, if you're GFF looks like this:

scaffold_97 JGI v3.3    gene    385 546 .   -   .   ID=Pp3s97_10;pacid=32918799;name=Pp3s97_10V3.1;tid=PAC:32918799
scaffold_97 JGI v3.3    mRNA    385 546 .   -   .   ID=Pp3s97_10V3.1;Parent=Pp3s97_10
scaffold_97 JGI v3.3    exon    385 546 .   -   .   ID=Pp3s97_10V3.1.exon.1;Parent=Pp3s97_10V3.1
scaffold_97 JGI v3.3    CDS 385 546 .   -   .   ID=Pp3s97_10V3.1.CDS.1;Parent=Pp3s97_10V3.1

And your gene IDs in the Ks distributions and CDS fasta files like Pp3s97_10, you would need to set --feature gene (third column info to use) and --gene_attribute ID (attribute name in the last column that refers to the correct gene ID) in your command. Alternatively for this example, you could use --feature mRNA and --gene_attribute Parent. Not sure if that is your problem though...

amit4mchiba commented 5 years ago

Thank you so much for your reply.

I really do not know why It did not work last time, but I decided to run the whole thing once again and it worked as expected. So, I have no clue what was wrong last time. I was able to get plots as expected from the mannual, although I am trying to now understand the interpretation. Do you have any recommendation paper to link to understand it. I can see two peaks in my Ks plot but do not know how to interpret it in terms of if gene duplication happened, and if yes then when and so on. I am attaching plots, do they look normal?

About wgd syn, you were right. The issue was gff and difference in id naming. I then used mRNA as -f and -a as ID, and it worked. I was able to get expected plots. genome.mcl.ks_anchors.zip

arzwa commented 5 years ago

Hi, I'm closing this issue and I would prefer to continue discussing results etc. via email, I'd like to keep the GitHub issues for software problems only.