arzwa / wgd

Python package and CLI for whole-genome duplication related analyses. This package is deprecated in favor of https://github.com/heche-psb/wgd.
http://wgd.readthedocs.io/en/latest/
GNU General Public License v3.0
83 stars 41 forks source link

errors in ksd step with the example dataset, newick error #44

Closed gainett closed 3 years ago

gainett commented 3 years ago

Hello,

I ran into the following errors when trying run the command 'wgd ksd sample.mcl sample.fasta' on the Arabdopsis samples provided: It goes on to a series of "Performing analysis on gene family GF_00000#", but suddenly spits the following (complete error attached):

##
multiprocessing.pool.RemoteTraceback:                                                                                                                     [776/1807]
"""                                                                                                                                                                 
Traceback (most recent call last):                                                                                                                                  
  File "/home/BIOTECH/psharma/.local/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 350, in __call__                                               
    return self.func(*args, **kwargs)                                                                                                                               
  File "/home/BIOTECH/psharma/.local/lib/python3.7/site-packages/joblib/parallel.py", line 131, in __call__                                                         
    return [func(*args, **kwargs) for func, args, kwargs in self.items]                                                                                             
  File "/home/BIOTECH/psharma/.local/lib/python3.7/site-packages/joblib/parallel.py", line 131, in <listcomp>                                                       
    return [func(*args, **kwargs) for func, args, kwargs in self.items]                                                                                             
  File "/home/BIOTECH/psharma/.local/lib/python3.7/site-packages/wgd/ks_distribution.py", line 305, in analyse_family                                               
    results_dict, msa=msa_path_protein, method=method)                                                                                                              
  File "/home/BIOTECH/psharma/.local/lib/python3.7/site-packages/wgd/ks_distribution.py", line 98, in _weighting
    tree_path, pairwise_estimates['Ks'])
  File "/home/BIOTECH/psharma/.local/lib/python3.7/site-packages/wgd/phy.py", line 123, in phylogenetic_tree_to_cluster_format
    t = Tree(tree)
  File "/home/BIOTECH/psharma/.local/lib/python3.7/site-packages/ete3/coretype/tree.py", line 213, in __init__
    quoted_names=quoted_node_names)
  File "/home/BIOTECH/psharma/.local/lib/python3.7/site-packages/ete3/parser/newick.py", line 264, in read_newick
    raise NewickError('Unexisting tree file or Malformed newick tree structure.')
ete3.parser.newick.NewickError: Unexisting tree file or Malformed newick tree structure.
You may want to check other newick loading flags like 'format' or 'quoted_node_names'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/BIOTECH/psharma/miniconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/BIOTECH/psharma/.local/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 359, in __call__
    raise TransportableException(text, e_type)
joblib.my_exceptions.TransportableException: TransportableException
___________________________________________________________________________
NewickError                                        Fri Oct 23 13:45:53 2020
PID: 18908        Python 3.7.7: /home/BIOTECH/psharma/miniconda3/bin/python

###

It goes on to run parallel.py and distribution.py, but this newick error appears several times: NewickError: Unexisting tree file or Malformed newick tree structure.

The wgd_ksd is empty after the run is done. Do you know what the problem may be? Thank you for your help! Best, Guilherme

arzwa commented 3 years ago

hmm, is there anything in the file /home/BIOTECH/psharma/Gainett/Ks_analysis/example_test/ks_tmp.38f794a55cd974/GF_000001.fasta.msa.nw? Does it work when you use another method to weigh Ks estimates like --wm alc?

gainett commented 3 years ago

Hello,

I tried erasing the temp file and runnign again. For some reason it stoped when processing GF 16; here is the temp file:

(base) [psharma@brc3 example_test]$ ls ks_tmp.38f7a143a23e06
GF_000001.codeml         GF_000003.fasta.msa      GF_000006.codeml         GF_000008.fasta.msa      GF_000011.codeml         GF_000013.fasta.msa      GF_000016.codeml
GF_000001.fasta          GF_000003.fasta.msa.nuc  GF_000006.fasta          GF_000008.fasta.msa.nuc  GF_000011.fasta          GF_000013.fasta.msa.nuc  GF_000016.fasta
GF_000001.fasta.msa      GF_000004.codeml         GF_000006.fasta.msa      GF_000009.codeml         GF_000011.fasta.msa      GF_000014.codeml         GF_000016.fasta.msa
GF_000001.fasta.msa.nuc  GF_000004.fasta          GF_000006.fasta.msa.nuc  GF_000009.fasta          GF_000011.fasta.msa.nuc  GF_000014.fasta          GF_000016.fasta.msa.nuc
GF_000002.codeml         GF_000004.fasta.msa      GF_000007.codeml         GF_000009.fasta.msa      GF_000012.codeml         GF_000014.fasta.msa
GF_000002.fasta          GF_000004.fasta.msa.nuc  GF_000007.fasta          GF_000009.fasta.msa.nuc  GF_000012.fasta          GF_000014.fasta.msa.nuc
GF_000002.fasta.msa      GF_000005.codeml         GF_000007.fasta.msa      GF_000010.codeml         GF_000012.fasta.msa      GF_000015.codeml
GF_000002.fasta.msa.nuc  GF_000005.fasta          GF_000007.fasta.msa.nuc  GF_000010.fasta          GF_000012.fasta.msa.nuc  GF_000015.fasta
GF_000003.codeml         GF_000005.fasta.msa      GF_000008.codeml         GF_000010.fasta.msa      GF_000013.codeml         GF_000015.fasta.msa
GF_000003.fasta          GF_000005.fasta.msa.nuc  GF_000008.fasta          GF_000010.fasta.msa.nuc  GF_000013.fasta          GF_000015.fasta.msa.nuc

Thank you!

gainett commented 3 years ago

So not generating the .nw file.

(fasttree is installed. When i type "fasttree" the program runs.)

gainett commented 3 years ago

Oh it does work when using --wm alc!

Hm but I did not undestand the error with default parameters. Could you please clarify this point? Thanks for the swift reply!

arzwa commented 3 years ago

OK, then the issue is definitely somewhere with Fasttree or when handling the trees in wgd... (the alc setting uses average linkage clustering of Ks values to obtain a tree, see here). Can you confirm that running FastTree on for instance the alignment in GF_000001.fasta.msa works?

gainett commented 3 years ago

Hello,

Yes, the following command works:

fasttree GF_000001.fasta.msa > GF_000001.fasta.msa.tre

it outputs a normal newick tree:

(AT4G19191.1:0.74187,(AT3G20730.1:0.72377,(AT1G06140.1:0.64391,AT1G28690.1:0.71609)0.884:0.09883)0.892:0.08577,((AT4G02750.1:0.57897,AT5G40410.1:0.71834)0.927:0.11772,((AT2G33680.2:0.67094,AT3G49740.1:0.87485)0.690:0.06623,(AT5G47460.1:0.83432,(AT4G04790.3:1.55929,((AT3G22670.1:1.19789,((AT5G12100.1:0.91733,(AT4G31850.1:0.82156,(AT1G62670.1:0.69670,AT1G09900.1:0.80328)0.245:0.05790)0.966:0.15262)0.121:0.04701,AT4G21170.1:1.68418)0.879:0.10188)0.785:0.11785,AT5G14350.1:1.19640)0.770:0.07394)1.000:0.42524)0.840:0.08021)0.851:0.05705)0.921:0.08708);

Thank you for your help!

I will keep digging here

On Sat, Oct 24, 2020 at 1:29 AM Arthur Zwaenepoel notifications@github.com wrote:

OK, then the issue is definitely somewhere with Fasttree or when handling the trees in wgd... (the alc setting uses average linkage clustering of Ks values to obtain a tree, see here https://wgd.readthedocs.io/en/latest/methods.html#calculating-ks-estimates-for-duplication-events). Can you confirm that running FastTree on for instance the alignment in GF_000001.fasta.msa works?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/arzwa/wgd/issues/44#issuecomment-715785692, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKIFNGPV4DDCILTRI46MOILSMJX37ANCNFSM4S46XD5Q .