arzwa / wgd

Python package and CLI for whole-genome duplication related analyses. This package is deprecated in favor of https://github.com/heche-psb/wgd.
http://wgd.readthedocs.io/en/latest/
GNU General Public License v3.0
81 stars 41 forks source link

multiplicon #39

Closed alslonik closed 4 years ago

alslonik commented 4 years ago

I am running wgd syn for the whole plant genome of 300 Mb and getting a "No multiplicons" warning. MCScan finds many collinearity blocks in my dataset, so something i do is clearly not correct. I tried changing i-adhore parameteres (gap size etc) but still no change. Do you possibly have an idea what is wrong?

This is i-ADHoRe v3.0.
Copyright (c) 2002-2010, Flanders Interuniversity Institute for Biotechnology, VIB.
Algorithm designed by Klaas Vandepoele, Cedric Simillion, Jan Fostier, Dieter De Witte,
Koen Janssens, Sebastian Proost, Yvan Saeys and Yves Van de Peer.

Process 1/1 is alive on purple.

************* i-ADHoRe parameters *************
    Number of genelists = 9
    Blast table = ./wgd_syn/families.tsv
    Output path = ./wgd_syn/i-adhore-out/
    Gap size = 30
    Cluster gap size = 35
    Cloud gap size = 0
    Cloud cluster gap size = 0
    Max gaps in alignment = 35
    Tandem gap = 15
    Flush output = 1000
    Q-value = 0.75
    Anchorpoints = 3
    Probability cutoff = 0.01
    Cloud filtering method = Binomial
    Level 2 only = false
    Use family = true
    Write statistics = false
    Alignment method = GreedyGraphbased4
    Multiple hypothesis correction = FDR
    Number of threads = 1
    Compare aligners = false
    Collinear searches only
    Visualize GHM.png = false
    Visualize Alignment = true
    Verbose output = true
************ END i-AdDHoRe parameters *********

Creating dataset...         done. (time: 0.017545s)
Mapping gene families...        done. (time: 0.061868s)
Remapping tandem duplicates...  done. (time: 0.023679s)
Writing genelists file...       done. (time: 0.10255s)
Collinear Search
Level 2 multiplicon detection...    done. (time: 0.244359s)
Profile detection...
Flushing output files...Visualize AlignedProfiles
done.
Time for Higher Level Detection: 0.00391197s.

All Done!  Bye...

2020-06-29 15:04:07: INFO   Drawing co-linearity dotplot
/home/alex/wgd_venv/lib/python3.7/site-packages/wgd/viz.py:223: UserWarning: Attempting to set identical left == right == 0 results in singular transformations; automatically expanding.
  ax.set_xlim(0, max(x))
/home/alex/wgd_venv/lib/python3.7/site-packages/wgd/viz.py:224: UserWarning: Attempting to set identical bottom == top == 0 results in singular transformations; automatically expanding.
  ax.set_ylim(0, max(x))
2020-06-29 15:04:07: INFO   Constructing Ks distribution for anchors
2020-06-29 15:04:08: INFO   Generating Ks colored (median Ks) dotplot
2020-06-29 15:04:08: WARNING    No multiplicons found!
2020-06-29 15:04:08: INFO   Generating histogram
2020-06-29 15:04:08: INFO   Will plot **node-averaged** histograms
2020-06-29 15:04:08: INFO   Will plot **node-averaged** histograms
2020-06-29 15:04:11: INFO   Done
arzwa commented 4 years ago

Hi, often this is because the --feature and --attribute options are not set correctly for the particular gff and gene families file. For instance, let's say this is your gene families (.mcl) file

UGI.Scf00122.20973.1    UGI.Scf00581.21718.1    UGI.Scf00274.13098.1    UGI.Scf01119.20275.1

and with a gff file with entries like this:

Scf00581    CoGe v4 mRNA    28544   32368   .   +   .   ID=UGI.Scf00581.21718.1;Parent=UGI.Scf00581.21718;Name=UGI.Scf00581.21718;gene_id=UGI.Scf00581.21718

you would need --feature mRNA and --attribute ID to indicate to the program how the gene identifiers in the gene families correspond to the entries in the gff file.

If that does not solve the issue I will need some more information, ideally a minimal working example that reproduces the issue.

alslonik commented 4 years ago

You are right, I missed that tiny tip in the gff. Thanks, worked out beautifully now, exactly the way i expected!

arzwa commented 4 years ago

great!