arzwa / wgd

Python package and CLI for whole-genome duplication related analyses. This package is deprecated in favor of https://github.com/heche-psb/wgd.
http://wgd.readthedocs.io/en/latest/
GNU General Public License v3.0
83 stars 41 forks source link

Dotplots.svg and ks_anchors.tsv files are empty #78

Open muthu1722 opened 2 years ago

muthu1722 commented 2 years ago

Hi, I have used supplementary material to run WGD. I am able to rum till wgd syn without any issue but in the wgd_syn results only dotplot as well as ks_ancjors.tsv files are empty. I have also tried with one sample data just to confirm whether the data am using creating any trouble but again with sample data I have got the same issue but other than these two files my histogram, tsv.mcl files are looking fine for both sample data and my own data. I am attaching my sample data as well as output for your refernce. It would be really helpful if you could help me to resolve it . Thank you in advance. Muthulakshmi sample.tar.gz

heche-psb commented 2 years ago

Hi Muthulakshmi,

I downloaded the enclosed file "sample.tar.gz" from you and reran the collinearity analysis successfully. Here is my step:

First, I reformat the gene ID of the CDS file "sample.fasta" to make it match the GFF file "ath.gff" in a python session:

>>>import pandas as pd
>>>df=pd.read_csv('sample.fasta',header=None)
>>>df2=df[0].astype(str).str.split(' | ',expand=True)
>>>df2[0].to_csv('sample.fa.cds',sep="\t",header=False, index=False)

Now I got the reformatted CDS file "sample.fa.cds" for next step;

To infer paralogous gene family, I used command:

$wgd dmd sample.fa.cds
2022-09-16 13:28:04: ERROR  Translation error (First codon 'TCA' is not a start codon) in seq AT1G03325.1
2022-09-16 13:28:04: ERROR  Translation error (First codon 'AGG' is not a start codon) in seq AT1G04105.1
2022-09-16 13:28:05: ERROR  Translation error (First codon 'AAT' is not a start codon) in seq AT1G13805.1
2022-09-16 13:28:05: ERROR  Translation error (Sequence length 2351 is not a multiple of three) in seq AT1G17000.1
[...]
2022-09-16 13:28:13: ERROR  Translation error (First codon 'ACG' is not a start codon) in seq ATMG01275.1
2022-09-16 13:28:13: ERROR  Translation error (Sequence length 922 is not a multiple of three) in seq ATMG01320.1
2022-09-16 13:28:14: INFO   One CDS file: will compute paranome

##Note that there were quite some genes that couldn't be correctly translated.

And I got two files in the "wgd_dmd" directory, "sample.fa.cds.mcl" and "sample.fa.cds_sample.fa.cds.tsv";

Next I inferred the collinearity based on the freshly-gained gene family information (assumed that I was still in the same directory which contained the original data):

$wgd syn -f mRNA -a ID ath.gff wgd_dmd/sample.fa.cds.mcl
2022-09-16 13:34:28: INFO   i-adhore stdout: This is i-ADHoRe v3.0.
Copyright (c) 2002-2010, Flanders Interuniversity Institute for Biotechnology, VIB.
Algorithm designed by Klaas Vandepoele, Cedric Simillion, Jan Fostier, Dieter De Witte,
Koen Janssens, Sebastian Proost, Yvan Saeys and Yves Van de Peer.

Process 1/1 is alive on localhost.
2022-09-16 13:34:28: INFO   i-adhore stderr: Error opening the settings file: -version
2022-09-16 13:34:28: INFO   Made output directory ./wgd_syn
2022-09-16 13:34:28: INFO   Parsing GFF file
2022-09-16 13:34:30: INFO   Writing gene lists
2022-09-16 13:34:30: INFO   Writing families file
2022-09-16 13:34:30: INFO   Writing configuration file
2022-09-16 13:34:30: INFO   Running I-ADHoRe 3.0
2022-09-16 13:35:03: WARNING    WARNING: Maximum allowed number of gaps in the alignment not specified.  Setting to cluster_gap.
WARNING: Tandem gap size not correct in settings file. Using default (gap_size / 2)

2022-09-16 13:35:03: INFO   
This is i-ADHoRe v3.0.
Copyright (c) 2002-2010, Flanders Interuniversity Institute for Biotechnology, VIB.
Algorithm designed by Klaas Vandepoele, Cedric Simillion, Jan Fostier, Dieter De Witte,
Koen Janssens, Sebastian Proost, Yvan Saeys and Yves Van de Peer.

Process 1/1 is alive on localhost.

************* i-ADHoRe parameters *************
    Number of genelists = 7
    Blast table = ./wgd_syn/families.tsv
    Output path = ./wgd_syn/i-adhore-out/
    Gap size = 30
    Cluster gap size = 35
    Cloud gap size = 0
    Cloud cluster gap size = 0
    Max gaps in alignment = 35
    Tandem gap = 15
    Flush output = 1000
    Q-value = 0.75
    Anchorpoints = 3
    Probability cutoff = 0.01
    Cloud filtering method = Binomial
    Level 2 only = false
    Use family = true
    Write statistics = false
    Alignment method = GreedyGraphbased4
    Multiple hypothesis correction = FDR
    Number of threads = 1
    Compare aligners = false
    Collinear searches only
    Visualize GHM.png = false
    Visualize Alignment = true
    Verbose output = true
************ END i-AdDHoRe parameters *********

Creating dataset...         done. (time: 0.022624s)
Mapping gene families...        done. (time: 0.031497s)
Remapping tandem duplicates...  done. (time: 0.0159261s)
Writing genelists file...       done. (time: 0.113681s)
Collinear Search
Level 2 multiplicon detection...    done. (time: 1.72764s)
Profile detection...
433 multiplicons to evaluate - evaluating level 2 multiplicon... 25 new multiplicons found.
[...]
2 multiplicons to evaluate - evaluating level 2 multiplicon... 0 new multiplicons found.
1 multiplicons to evaluate - evaluating level 2 multiplicon... level-2 multiplicon is redundant
Flushing output files...Visualize AlignedProfiles
badprofile (all boxes will be black! => segmentlength differs among the segs of alignment)
badprofile (all boxes will be black! => segmentlength differs among the segs of alignment)
done.
Time for Higher Level Detection: 30.8817s.

All Done!  Bye...

2022-09-16 13:35:03: INFO   Drawing co-linearity dotplot
2022-09-16 13:35:11: INFO   Done

The resulting file "sample.fa.cds.mcl.dotplot.svg" in directory "wgd_syn" is the inferred intraspecific collinear dotplot, as shown below.

sample fa cds mcl dotplot

Given the successful run of "wgd syn" without Ks data, I suppose it will also be no problem for another run with Ks data. It's weird that your result file "sample.fasta.blast.tsv.mcl.dotplot" was even empty. Do you have the log file of that failed run, we can diagnose further in detail based on that log.

muthu1722 commented 1 year ago

sample.tar.gz Hi, Thank you very much for your reply. I'm really sorry it took me so long to reply because of our system issues. I have just followed your instructions and tried with the same sample file but again it ended up with empty dot plot and Ks_anchors.tsv. I have attached my output as well as log file of wgd_syn run for your reference. Could you please help me to sort out. Thanks in advance

lizhao007 commented 1 year ago

Hi, I have the same problem while running wgd syn .The .tsv file and .mcl file seem to be right, but the Dotplots.svg and ks_anchors.tsv files are empty. There are not error report in log file. I get the same problem while using sample.fa, ath.fa and my own data. This is the result of sample.fa after running wgd syn , could you please help me to sort out. Thanks.

sample mcl dotplot sample mcl ks_anchors

lizhao007 commented 1 year ago

Meanwhile, I run the code above as wgd syn -f mRNA -a ID ath.gff wgd_dmd/sample.fa.cds.mcl but got the same empty Dotplots.svg.

heche-psb commented 1 year ago

Could you please provide the version of wgd you were using? And the version of other python packages. Thanks.

lizhao007 commented 1 year ago

Thanks for your quick answers, the version of wgd is v1.1. I think python is v3.7 because i create a new enviroment and download python3.7 to run wgd by Conda, but it is likely python2.7 is also in this enviroment

图片

lizhao007 commented 1 year ago

(wgd) [zhaoli@mn02 example]$ conda list packages in environment at /public/home/zhaoli/software/anaconda3/envs/wgd: Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge biopython 1.79 pypi_0 pypi blast 2.13.0 hf3cf87c_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda bokeh 1.4.0 pypi_0 pypi bzip2 1.0.8 h7f98852_4 conda-forge c-ares 1.18.1 h7f98852_0 conda-forge ca-certificates 2022.9.24 ha878542_0 conda-forge certifi 2022.9.24 pyhd8ed1ab_0 conda-forge click 8.1.3 pypi_0 pypi cmake 3.24.2 h5432695_0 conda-forge coloredlogs 15.0.1 pypi_0 pypi curl 7.85.0 h7bff187_0 conda-forge cycler 0.11.0 pypi_0 pypi entrez-direct 16.2 he881be0_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda ete3 3.1.2 pypi_0 pypi expat 2.4.9 h27087fc_0 conda-forge fastcluster 1.1.25 pypi_0 pypi fasttree 2.1.11 hec16e2b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda fonttools 4.37.4 pypi_0 pypi gettext 0.21.1 h27087fc_0 conda-forge humanfriendly 10.0 pypi_0 pypi importlib-metadata 5.0.0 pypi_0 pypi jinja2 3.1.2 pypi_0 pypi joblib 0.11 pypi_0 pypi keyutils 1.6.1 h166bdaf_0 conda-forge kiwisolver 1.4.4 pypi_0 pypi krb5 1.19.3 h3790be6_0 conda-forge ld_impl_linux-64 2.39 hc81fddc_0 conda-forge libcurl 7.85.0 h7bff187_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 h516909a_1 conda-forge libffi 3.3 h58526e2_2 conda-forge libgcc-ng 12.2.0 h65d4601_18 conda-forge libgomp 12.2.0 h65d4601_18 conda-forge libidn2 2.3.3 h166bdaf_0 conda-forge libnghttp2 1.47.0 hdcd2b5c_1 conda-forge libnsl 2.0.0 h7f98852_0 conda-forge libpng 1.6.38 h753d276_0 conda-forge libsqlite 3.39.4 h753d276_0 conda-forge libssh2 1.10.0 haa6b8db_3 conda-forge libstdcxx-ng 12.2.0 h46fd767_18 conda-forge libunistring 0.9.10 h7f98852_0 conda-forge libuv 1.44.2 h166bdaf_0 conda-forge libzlib 1.2.13 h166bdaf_4 conda-forge mafft 7.508 hec16e2b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda markupsafe 2.1.1 pypi_0 pypi matplotlib 3.5.3 pypi_0 pypi mcl 14.137 pl5321hec16e2b_8 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda mpi 1.0 mpich conda-forge muscle 5.1 h9f5acd7_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda ncurses 6.3 h27087fc_1 conda-forge numpy 1.21.6 pypi_0 pypi openssl 1.1.1q h166bdaf_0 conda-forge packaging 21.3 pypi_0 pypi paml 4.9 hec16e2b_7 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda pandas 1.2.0 pypi_0 pypi pcre 8.45 h9c3ff4c_0 conda-forge perl 5.32.1 2_h7f98852_perl5 conda-forge perl-archive-tar 2.40 pl5321hdfd78af_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda perl-carp 1.38 pl5321hdfd78af_4 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda perl-common-sense 3.75 pl5321hdfd78af_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda perl-compress-raw-bzip2 2.201 pl5321h87f3376_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda perl-compress-raw-zlib 2.105 pl5321h87f3376_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda perl-encode 3.19 pl5321hec16e2b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda perl-exporter 5.72 pl5321hdfd78af_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda perl-exporter-tiny 1.002002 pl5321hdfd78af_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda perl-extutils-makemaker 7.64 pl5321hd8ed1ab_0 conda-forge perl-io-compress 2.201 pl5321h87f3376_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda perl-io-zlib 1.11 pl5321hdfd78af_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda perl-json 4.10 pl5321hdfd78af_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda perl-json-xs 2.34 pl5321h9f5acd7_5 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda perl-list-moreutils 0.430 pl5321hdfd78af_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda perl-list-moreutils-xs 0.430 pl5321hec16e2b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda perl-parent 0.236 pl5321hdfd78af_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda perl-pathtools 3.75 pl5321hec16e2b_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda perl-scalar-list-utils 1.62 pl5321hec16e2b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda perl-types-serialiser 1.01 pl5321hdfd78af_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda pillow 9.2.0 pypi_0 pypi pip 22.3 pyhd8ed1ab_0 conda-forge plumbum 1.8.0 pypi_0 pypi prank v.170427 h9f5acd7_5 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda progressbar2 4.1.1 pypi_0 pypi pyparsing 3.0.9 pypi_0 pypi python 3.7.13 haa1d7c7_1 defaults python-dateutil 2.8.2 pypi_0 pypi python-utils 3.3.3 pypi_0 pypi pytz 2022.5 pypi_0 pypi pyyaml 6.0 pypi_0 pypi readline 8.1.2 h0f457ee_0 conda-forge rhash 1.4.3 h166bdaf_0 conda-forge scikit-learn 1.0.2 pypi_0 pypi scipy 1.7.3 pypi_0 pypi seaborn 0.12.1 pypi_0 pypi setuptools 65.5.0 pyhd8ed1ab_0 conda-forge six 1.16.0 pypi_0 pypi sklearn 0.0 pypi_0 pypi sqlite 3.39.4 h4ff8645_0 conda-forge threadpoolctl 3.1.0 pypi_0 pypi tk 8.6.12 h27826a3_0 conda-forge tornado 6.2 pypi_0 pypi typing-extensions 4.4.0 pypi_0 pypi wgd 1.2 pypi_0 pypi wget 1.20.3 ha56f1ee_1 conda-forge wheel 0.37.1 pyhd8ed1ab_0 conda-forge xz 5.2.6 h166bdaf_0 conda-forge zipp 3.9.0 pypi_0 pypi zlib 1.2.13 h166bdaf_4 conda-forge zstd 1.5.2 h6239696_4 conda-forge

heche-psb commented 1 year ago

Hi, there is no multiplicon found in your run. I guess there might be a gene id extraction issue. Could you make sure that you give correct feature and attribute as in gff3 and family files?

sample.tar.gz Hi, Thank you very much for your reply. I'm really sorry it took me so long to reply because of our system issues. I have just followed your instructions and tried with the same sample file but again it ended up with empty dot plot and Ks_anchors.tsv. I have attached my output as well as log file of wgd_syn run for your reference. Could you please help me to sort out. Thanks in advance

heche-psb commented 1 year ago

I think you might need to check whether there is any multiplicons found in the first place. The log provided by @muthu1722 shows no multiplicons found by i-adhore.

Meanwhile, I run the code above as wgd syn -f mRNA -a ID ath.gff wgd_dmd/sample.fa.cds.mcl but got the same empty Dotplots.svg.