arzwa / wgd

Python package and CLI for whole-genome duplication related analyses. This package is deprecated in favor of https://github.com/heche-psb/wgd.
http://wgd.readthedocs.io/en/latest/
GNU General Public License v3.0
81 stars 40 forks source link

wgd ksd python error #55

Closed zxclovezby closed 3 years ago

zxclovezby commented 3 years ago

hello,I had a problem using the software,The first step is to generate mcl It's normal,but when run wgd ksd sample.blast.tsv.mcl genome.cds.fa -n 60 It reported a mistake,Here is the error log:

/home/zhangxc/miniconda3/envs/wgd/lib/python3.6/site-packages/joblib/parallel.py in __call__(self=Parallel(n_jobs=60), iterable=<generator object ks_analysis_paranome.<locals>.<genexpr>>)
    784             if pre_dispatch == "all" or n_jobs == 1:
    785                 # The iterable was consumed all at once by the above for loop.
    786                 # No need to wait for async callbacks to trigger to
    787                 # consumption.
    788                 self._iterating = False
--> 789             self.retrieve()
        self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=60)>
    790             # Make sure that we get a last message telling us we are done
    791             elapsed_time = time.time() - self._start_time
    792             self._print('Done %3i out of %3i | elapsed: %s finished',
    793                         (len(self._output), len(self._output),

---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
IndexError                                         Thu Jan 28 17:52:58 2021
PID: 91670   Python 3.6.10: /home/zhangxc/miniconda3/envs/wgd/bin/python3.6
...........................................................................
/home/zhangxc/miniconda3/envs/wgd/lib/python3.6/site-packages/joblib/parallel.py in __call__(self=<joblib.parallel.BatchedCalls object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        self.items = [(<function analyse_family>, ('GF_000240', {'evm.model.CTG_1185.7': 'MMKNSAVVRNVPHSWNMSAWGGNLILNNPEKHEVIFLYKCLYWLGLLK...IRPERGPFDKPEFEFTACGILNYKRSNMLAVIGALLTYTIIVFNSKDSS', 'evm.model.CTG_1357.17': 'MDFLLCFYKFIGLVDKSERNRPVNIGSLMFLIYLVLLFLDNVMMVYMN...NPVVSTTTTYTFEDTKVATTTAAPITKAPSPALTTAAAKKGKNRGNKKQ', 'evm.model.CTG_1405.53': 'MQPGSVTSYSGFITVNKEYNSNTFFWFFPAMNDNKKAPIILWLQGGPG...DKTDVAGYVRNYGNLFGILVRNAGHMVPHDQPEAALDLITRFIKDIPYA', 'evm.model.CTG_246.62': 'MHINTLQMDNTKSWICKTDFLISFLKLAGLFENSGTNQCTNIGFKIFL...ERLEMLLCEKTDMVLTGGNVIHFRRSLILTFLGTVLTYTFLLMNTNCTK', 'evm.model.CTG_247.55_evm.model.CTG_247.56_evm.model.CTG_247.57': 'MTFLKSFIFVLSCLSITFATHPDEVQDLPGLSFGLNYKHYSGYLNATS...KFNDQIAGFVKSYKNLSYLTIKGSGHMVPQDKPGPAFKMIDSFLNNKPY', 'evm.model.CTG_450.26': 'MNSTFVVLLCSLLALAIVSVSSGPVEKKEAWGYVNVRKDAYMFWWLYY...ASSSTQTGGYVKSFKQFRLFWILKAGHMIPADAPEAALQMLDMILSKKK', 'evm.model.CTG_482.10': 'MVRKAIHVGNLAFNDISETANHYLKEDPVVSTSTAQVATLANNYKVLI...DKNDVAGYVRNYGNLFGILVRNAGHLVPYDQPEAALDLITRFVKDIPYA', 'evm.model.CTG_482.8': 'MVLLFTILCVVLIKLAVAEDMSSPTGKPLFLTPYIENGDVQKGRLLSR...DKNDVAGYVRNYGNLFGILVRNAGHLVPYDQPEAALDLITRFIKDIPYA', 'evm.model.CTG_591.1': 'MHPILNQLVTLLFFRLCHQCCENLRKITNDIHNCPVLNFTHSVQIEMF...EPKEHSYYIDYVSTEMVRKAIHVGNLAFNDLSETANHYLKVDPVVSTST'}, {'evm.model.CTG_1.1': 'ATGAAGGGTGGGGGTACTGAGGGGATGCGCGACGTGCGCGTGGACTTG...ACCACTTTTATCAGCGTTATTATTTGCAAAAATCTTGTTATCCATCTAA', 'evm.model.CTG_1.100': 'ATGCCTGCTAGTCTGATAAGCGAAATGGATCGGAGTAGTCGGGTGTTG...CGTGTACTGCATGCAACCGGCCGCCCAGGCCATCCGCATCTACAACTAG', 'evm.model.CTG_1.102': 'ATGGGGAATAAGGTGGTTACGTTTACTGAAGAACAGCTAGAAGACTAT...CTACGCGTTCAGGGTGTACGATTACGATGGGACAAATTCATCGCGCTGA', 'evm.model.CTG_1.103': 'ATGGATGCAAAGGAAATTAAATCGAATCTTAAGCAAGCTAGAGAAGCT...AATGCATGTAAAGACTGATGAACGTGAGAAATGGACTACAAGTTGGTGA', 'evm.model.CTG_1.104': 'ATGTATCGCCAACATTTAAAAATGGTTAATAAAGAGTTGGAAAGAAAT...TCTTGTATTATTTAATGTAACTCAAAACCAAAGTATAGCTGTAACGTAA', 'evm.model.CTG_1.105': 'ATGATGATAGAGGTGCATGCAAGATTACCCCATGGCTCGGTTTTCCTT...AGTGGCTCATGGCTTACCTTCTAAACTGGATCATATGATAACAATTTAA', 'evm.model.CTG_1.106': 'ATGTCTTACATGTTGGAGCACCTTCACAACGGATGGCAAGTCGACCGA...GGGTTTGGTTATTTCTCCCAAAGATTATTCCACAAAATACAGATATTAA', 'evm.model.CTG_1.107': 'ATGGGAGCTCTTGGATCAAAACGAAGGGAATACAATATCAATAGTGAT...GTGTGTCAATCTTGGAGAAACCATACTTAAGAGTCAGATTGATTCTTAA', 'evm.model.CTG_1.108': 'ATGACTCGATTAAGAAACGATTTTTTCTCCCTACTCATCTTTGGATTA...ACATAGAGACCAAAAAGGGAAAGGACAAGACGTTAGGCAAAGAAAATGA', 'evm.model.CTG_1.109': 'ATGAAACTGGAACAAGAGAGAGAAAAATGTAACACATTTAAAAAGGAT...TGGGGAAAACGAACTACAAGAGGAGGATTTTGAATTGAATATTGTTTAA', ...}, '/data/zhangxc/biosoft/wgd/Genome/Zhuby/wgd_out/wgd_mcl/ks_tmp.39437c46c24bac', 'codeml', True, 1, 100, 'fasttree', 'mafft', '/data/zhangxc/biosoft/wgd/Genome/Zhuby/wgd_out/wgd_mcl/wgd_ksd'), {})]
    132
    133     def __len__(self):
    134         return self._size
    135

...........................................................................
/home/zhangxc/miniconda3/envs/wgd/lib/python3.6/site-packages/joblib/parallel.py in <listcomp>(.0=<list_iterator object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function analyse_family>
        args = ('GF_000240', {'evm.model.CTG_1185.7': 'MMKNSAVVRNVPHSWNMSAWGGNLILNNPEKHEVIFLYKCLYWLGLLK...IRPERGPFDKPEFEFTACGILNYKRSNMLAVIGALLTYTIIVFNSKDSS', 'evm.model.CTG_1357.17': 'MDFLLCFYKFIGLVDKSERNRPVNIGSLMFLIYLVLLFLDNVMMVYMN...NPVVSTTTTYTFEDTKVATTTAAPITKAPSPALTTAAAKKGKNRGNKKQ', 'evm.model.CTG_1405.53': 'MQPGSVTSYSGFITVNKEYNSNTFFWFFPAMNDNKKAPIILWLQGGPG...DKTDVAGYVRNYGNLFGILVRNAGHMVPHDQPEAALDLITRFIKDIPYA', 'evm.model.CTG_246.62': 'MHINTLQMDNTKSWICKTDFLISFLKLAGLFENSGTNQCTNIGFKIFL...ERLEMLLCEKTDMVLTGGNVIHFRRSLILTFLGTVLTYTFLLMNTNCTK', 'evm.model.CTG_247.55_evm.model.CTG_247.56_evm.model.CTG_247.57': 'MTFLKSFIFVLSCLSITFATHPDEVQDLPGLSFGLNYKHYSGYLNATS...KFNDQIAGFVKSYKNLSYLTIKGSGHMVPQDKPGPAFKMIDSFLNNKPY', 'evm.model.CTG_450.26': 'MNSTFVVLLCSLLALAIVSVSSGPVEKKEAWGYVNVRKDAYMFWWLYY...ASSSTQTGGYVKSFKQFRLFWILKAGHMIPADAPEAALQMLDMILSKKK', 'evm.model.CTG_482.10': 'MVRKAIHVGNLAFNDISETANHYLKEDPVVSTSTAQVATLANNYKVLI...DKNDVAGYVRNYGNLFGILVRNAGHLVPYDQPEAALDLITRFVKDIPYA', 'evm.model.CTG_482.8': 'MVLLFTILCVVLIKLAVAEDMSSPTGKPLFLTPYIENGDVQKGRLLSR...DKNDVAGYVRNYGNLFGILVRNAGHLVPYDQPEAALDLITRFIKDIPYA', 'evm.model.CTG_591.1': 'MHPILNQLVTLLFFRLCHQCCENLRKITNDIHNCPVLNFTHSVQIEMF...EPKEHSYYIDYVSTEMVRKAIHVGNLAFNDLSETANHYLKVDPVVSTST'}, {'evm.model.CTG_1.1': 'ATGAAGGGTGGGGGTACTGAGGGGATGCGCGACGTGCGCGTGGACTTG...ACCACTTTTATCAGCGTTATTATTTGCAAAAATCTTGTTATCCATCTAA', 'evm.model.CTG_1.100': 'ATGCCTGCTAGTCTGATAAGCGAAATGGATCGGAGTAGTCGGGTGTTG...CGTGTACTGCATGCAACCGGCCGCCCAGGCCATCCGCATCTACAACTAG', 'evm.model.CTG_1.102': 'ATGGGGAATAAGGTGGTTACGTTTACTGAAGAACAGCTAGAAGACTAT...CTACGCGTTCAGGGTGTACGATTACGATGGGACAAATTCATCGCGCTGA', 'evm.model.CTG_1.103': 'ATGGATGCAAAGGAAATTAAATCGAATCTTAAGCAAGCTAGAGAAGCT...AATGCATGTAAAGACTGATGAACGTGAGAAATGGACTACAAGTTGGTGA', 'evm.model.CTG_1.104': 'ATGTATCGCCAACATTTAAAAATGGTTAATAAAGAGTTGGAAAGAAAT...TCTTGTATTATTTAATGTAACTCAAAACCAAAGTATAGCTGTAACGTAA', 'evm.model.CTG_1.105': 'ATGATGATAGAGGTGCATGCAAGATTACCCCATGGCTCGGTTTTCCTT...AGTGGCTCATGGCTTACCTTCTAAACTGGATCATATGATAACAATTTAA', 'evm.model.CTG_1.106': 'ATGTCTTACATGTTGGAGCACCTTCACAACGGATGGCAAGTCGACCGA...GGGTTTGGTTATTTCTCCCAAAGATTATTCCACAAAATACAGATATTAA', 'evm.model.CTG_1.107': 'ATGGGAGCTCTTGGATCAAAACGAAGGGAATACAATATCAATAGTGAT...GTGTGTCAATCTTGGAGAAACCATACTTAAGAGTCAGATTGATTCTTAA', 'evm.model.CTG_1.108': 'ATGACTCGATTAAGAAACGATTTTTTCTCCCTACTCATCTTTGGATTA...ACATAGAGACCAAAAAGGGAAAGGACAAGACGTTAGGCAAAGAAAATGA', 'evm.model.CTG_1.109': 'ATGAAACTGGAACAAGAGAGAGAAAAATGTAACACATTTAAAAAGGAT...TGGGGAAAACGAACTACAAGAGGAGGATTTTGAATTGAATATTGTTTAA', ...}, '/data/zhangxc/biosoft/wgd/Genome/Zhuby/wgd_out/wgd_mcl/ks_tmp.39437c46c24bac', 'codeml', True, 1, 100, 'fasttree', 'mafft', '/data/zhangxc/biosoft/wgd/Genome/Zhuby/wgd_out/wgd_mcl/wgd_ksd')
        kwargs = {}
    132
    133     def __len__(self):
    134         return self._size
    135

...........................................................................
/home/zhangxc/miniconda3/envs/wgd/lib/python3.6/site-packages/wgd/ks_distribution.py in analyse_family(family_id='GF_000240', family={'evm.model.CTG_1185.7': 'MMKNSAVVRNVPHSWNMSAWGGNLILNNPEKHEVIFLYKCLYWLGLLK...IRPERGPFDKPEFEFTACGILNYKRSNMLAVIGALLTYTIIVFNSKDSS', 'evm.model.CTG_1357.17': 'MDFLLCFYKFIGLVDKSERNRPVNIGSLMFLIYLVLLFLDNVMMVYMN...NPVVSTTTTYTFEDTKVATTTAAPITKAPSPALTTAAAKKGKNRGNKKQ', 'evm.model.CTG_1405.53': 'MQPGSVTSYSGFITVNKEYNSNTFFWFFPAMNDNKKAPIILWLQGGPG...DKTDVAGYVRNYGNLFGILVRNAGHMVPHDQPEAALDLITRFIKDIPYA', 'evm.model.CTG_246.62': 'MHINTLQMDNTKSWICKTDFLISFLKLAGLFENSGTNQCTNIGFKIFL...ERLEMLLCEKTDMVLTGGNVIHFRRSLILTFLGTVLTYTFLLMNTNCTK', 'evm.model.CTG_247.55_evm.model.CTG_247.56_evm.model.CTG_247.57': 'MTFLKSFIFVLSCLSITFATHPDEVQDLPGLSFGLNYKHYSGYLNATS...KFNDQIAGFVKSYKNLSYLTIKGSGHMVPQDKPGPAFKMIDSFLNNKPY', 'evm.model.CTG_450.26': 'MNSTFVVLLCSLLALAIVSVSSGPVEKKEAWGYVNVRKDAYMFWWLYY...ASSSTQTGGYVKSFKQFRLFWILKAGHMIPADAPEAALQMLDMILSKKK', 'evm.model.CTG_482.10': 'MVRKAIHVGNLAFNDISETANHYLKEDPVVSTSTAQVATLANNYKVLI...DKNDVAGYVRNYGNLFGILVRNAGHLVPYDQPEAALDLITRFVKDIPYA', 'evm.model.CTG_482.8': 'MVLLFTILCVVLIKLAVAEDMSSPTGKPLFLTPYIENGDVQKGRLLSR...DKNDVAGYVRNYGNLFGILVRNAGHLVPYDQPEAALDLITRFIKDIPYA', 'evm.model.CTG_591.1': 'MHPILNQLVTLLFFRLCHQCCENLRKITNDIHNCPVLNFTHSVQIEMF...EPKEHSYYIDYVSTEMVRKAIHVGNLAFNDLSETANHYLKVDPVVSTST'}, nucleotide={'evm.model.CTG_1.1': 'ATGAAGGGTGGGGGTACTGAGGGGATGCGCGACGTGCGCGTGGACTTG...ACCACTTTTATCAGCGTTATTATTTGCAAAAATCTTGTTATCCATCTAA', 'evm.model.CTG_1.100': 'ATGCCTGCTAGTCTGATAAGCGAAATGGATCGGAGTAGTCGGGTGTTG...CGTGTACTGCATGCAACCGGCCGCCCAGGCCATCCGCATCTACAACTAG', 'evm.model.CTG_1.102': 'ATGGGGAATAAGGTGGTTACGTTTACTGAAGAACAGCTAGAAGACTAT...CTACGCGTTCAGGGTGTACGATTACGATGGGACAAATTCATCGCGCTGA', 'evm.model.CTG_1.103': 'ATGGATGCAAAGGAAATTAAATCGAATCTTAAGCAAGCTAGAGAAGCT...AATGCATGTAAAGACTGATGAACGTGAGAAATGGACTACAAGTTGGTGA', 'evm.model.CTG_1.104': 'ATGTATCGCCAACATTTAAAAATGGTTAATAAAGAGTTGGAAAGAAAT...TCTTGTATTATTTAATGTAACTCAAAACCAAAGTATAGCTGTAACGTAA', 'evm.model.CTG_1.105': 'ATGATGATAGAGGTGCATGCAAGATTACCCCATGGCTCGGTTTTCCTT...AGTGGCTCATGGCTTACCTTCTAAACTGGATCATATGATAACAATTTAA', 'evm.model.CTG_1.106': 'ATGTCTTACATGTTGGAGCACCTTCACAACGGATGGCAAGTCGACCGA...GGGTTTGGTTATTTCTCCCAAAGATTATTCCACAAAATACAGATATTAA', 'evm.model.CTG_1.107': 'ATGGGAGCTCTTGGATCAAAACGAAGGGAATACAATATCAATAGTGAT...GTGTGTCAATCTTGGAGAAACCATACTTAAGAGTCAGATTGATTCTTAA', 'evm.model.CTG_1.108': 'ATGACTCGATTAAGAAACGATTTTTTCTCCCTACTCATCTTTGGATTA...ACATAGAGACCAAAAAGGGAAAGGACAAGACGTTAGGCAAAGAAAATGA', 'evm.model.CTG_1.109': 'ATGAAACTGGAACAAGAGAGAGAAAAATGTAACACATTTAAAAAGGAT...TGGGGAAAACGAACTACAAGAGGAGGATTTTGAATTGAATATTGTTTAA', ...}, tmp='/data/zhangxc/biosoft/wgd/Genome/Zhuby/wgd_out/wgd_mcl/ks_tmp.39437c46c24bac', codeml=<wgd.codeml.Codeml object>, preserve=True, times=1, min_length=100, method='fasttree', aligner='mafft', output_dir='/data/zhangxc/biosoft/wgd/Genome/Zhuby/wgd_out/wgd_mcl/wgd_ksd', **kwargs={})
    284
    285     # Calculate Ks values (codeml) ---------------------------------------------
    286     codeml = Codeml(codeml=codeml, tmp=tmp, id=family_id, **kwargs)
    287     logging.debug('Performing codeml analysis on {}'.format(family_id))
    288     results_dict, codeml_out = codeml.run_codeml(
--> 289             os.path.basename(msa_path), preserve=preserve, times=times)
        msa_path = '/data/zhangxc/biosoft/wgd/Genome/Zhuby/wgd_out/wgd_mcl/ks_tmp.39437c46c24bac/GF_000240.fasta.msa.nuc'
        preserve = True
        times = 1
    290     if not results_dict:
    291         logging.warning('No codeml results for {}!'.format(family_id))
    292         return
    293

...........................................................................
/home/zhangxc/miniconda3/envs/wgd/lib/python3.6/site-packages/wgd/codeml.py in run_codeml(self=<wgd.codeml.Codeml object>, msa='GF_000240.fasta.msa.nuc', raw=False, preserve=True, times=1)
    311             if not os.path.isfile(self.out_file):
    312                 logging.warning(
    313                         'Codeml output file {} not found'.format(self.out_file))
    314                 return None
    315
--> 316             d, likelihood = _parse_codeml_out(self.out_file)
        d = undefined
        likelihood = undefined
        self.out_file = '/data/zhangxc/biosoft/wgd/Genome/Zhuby/wgd_out/wgd_mcl/ks_tmp.39437c46c24bac/GF_000240.codeml'
    317             output.append(d)
    318             if not best or likelihood > best:
    319                 best = likelihood
    320                 best_index = i

...........................................................................
/home/zhangxc/miniconda3/envs/wgd/lib/python3.6/site-packages/wgd/codeml.py in _parse_codeml_out(codeml_out='/data/zhangxc/biosoft/wgd/Genome/Zhuby/wgd_out/wgd_mcl/ks_tmp.39437c46c24bac/GF_000240.codeml')
     63     likelihood = re.compile('lnL\s*=(\D*\d+.\d+)')
     64
     65     # read codeml output file
     66     with open(codeml_out, 'r') as f:
     67         fcont = f.read()
---> 68     n = int(fcont.split("\n")[3].split()[2].strip())
        n = undefined
        fcont.split.split.strip = undefined
     69     codeml_results = fcont.split('pairwise comparison')[-1].split("\n\n\n")[1:]
     70     if len(codeml_results) != n*(n-1)/2:
     71         logging.error("Not all gene pairs present in {}".format(codeml_out))
     72         return None, None

IndexError: list index out of range

look forward to your reply!!

arzwa commented 3 years ago

What does the file /data/zhangxc/biosoft/wgd/Genome/Zhuby/wgd_out/wgd_mcl/ks_tmp.39437c46c24bac/GF_000240.codeml look like? What version of paml are you using?

zxclovezby commented 3 years ago

hello,Thank you very much for your reply,/data/zhangxc/biosoft/wgd/Genome/Zhuby/wgd_out/wgd_mcl/ks_tmp.39437c46c24bac/GF_000240.codeml The file is empty,paml version is paml=4.9,this is my conda yaml: `$ cat wgd.yaml name: wgd channels:

arzwa commented 3 years ago

So it appears something is causing the codeml output to be an empty file. Could you have a look at the other files for GF_000240 ( /data/zhangxc/biosoft/wgd/Genome/Zhuby/wgd_out/wgd_mcl/ks_tmp.39437c46c24bac/GF_000240*) ? Perhaps there is something unexpected there.

zxclovezby commented 3 years ago

图片 Some files are available, and some are empty. Could you tell me the reason for this? Thank you very much

arzwa commented 3 years ago

I can't tell from this. Please try to provide some more information, do the other files have output? What does the alignment look like for gene family GF_000240 etc.?

zxclovezby commented 3 years ago

OK, other alignment results exist.GF_000240.fasta,GF_000240.fasta.msa,GF_000240.fasta.msa.nuc,GF_000240.ctrl Only the content of codeml file is empty. It should be my sequence itself. Thank you very much for your reply. wish you a happy life.

arzwa commented 3 years ago

And what is the content of these other .codeml files? Can you try running codeml with the GF_000240.ctrl file?

zxclovezby commented 3 years ago
$ codeml GF_000240.ctrl
CODONML in paml version 4.9, March 2015

Error in sequence data file: L at 1 seq 1.
Make sure to separate the sequence from its name by 2 or more spaces.

This is the result of the operation

arzwa commented 3 years ago

Could you please show the contents or attach all GF_000240.* files? I can't tell much from this information.

zxclovezby commented 3 years ago

ok, These are the generated files: GF_000240.ctrl.gz GF_000240.fasta.gz GF_000240.fasta.msa.gz GF_000240.fasta.msa.nuc.gz

arzwa commented 3 years ago

For me it works both with the latest paml version 4.9j (2020) as well as paml 4.9f (2017). Your paml version seems outdated, please install 4.9j from here. Something like this should work:

wget http://abacus.gene.ucl.ac.uk/software/paml4.9j.tgz
tar -xzf paml4.9j.tgz
pushd paml4.9j/src && make -f Makefile && popd 
export PATH=$PATH:$PWD/paml4.9j/src/
zxclovezby commented 3 years ago

Thank you very much for your patience. The problem has been solved. Thank you again.It turned out to be a version problem.Thank you again!

arzwa commented 3 years ago

No problem, I should have updated the docs. I now added this explicitly in the README.