arzwa / wgd

Python package and CLI for whole-genome duplication related analyses. This package is deprecated in favor of https://github.com/heche-psb/wgd.
http://wgd.readthedocs.io/en/latest/
GNU General Public License v3.0
80 stars 40 forks source link

enconding error #43

Closed shiyi-pan closed 3 years ago

shiyi-pan commented 3 years ago

Hi, I uesd WGD with the command :

 ksd --n_threads 8   $DIR/final_cds.out/format.final.cds.fa.blast.tsv.mcl    $DIR/format.final.cds.fa

 and got an error like that:
2020-10-08 10:07:41: INFO   Performing analysis on gene family GF_000280
--- Logging error ---
Traceback (most recent call last):
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/logging/__init__.py", line 994, in emit
    stream.write(msg)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/encodings/iso8859_15.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u03c9' in position 32: character maps to <undefined>
Call stack:
  File "/ds3512/home/panyp/ruanjian/python3/bin/wgd", line 11, in <module>
    load_entry_point('wgd==1.1', 'console_scripts', 'wgd')()
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/wgd-1.1-py3.6.egg/wgd_cli.py", line 632, in ksd
    max_pairwise=max_pairwise
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/wgd-1.1-py3.6.egg/wgd_cli.py", line 773, in ksd_
    max_pairwise=max_pairwise,
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/wgd-1.1-py3.6.egg/wgd/ks_distribution.py", line 645, in ks_analysis_paranome
    ) for family in sorted_families)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/joblib/parallel.py", line 749, in __call__
    n_jobs = self._initialize_backend()
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/joblib/parallel.py", line 547, in _initialize_backend
    **self._backend_args)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 317, in configure
    self._pool = MemmapingPool(n_jobs, **backend_args)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/joblib/pool.py", line 600, in __init__
    super(MemmapingPool, self).__init__(**poolargs)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/joblib/pool.py", line 420, in __init__
    super(PicklingPool, self).__init__(**poolargs)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/multiprocessing/pool.py", line 174, in __init__
    self._repopulate_pool()
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/multiprocessing/pool.py", line 239, in _repopulate_pool
    w.start()
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/multiprocessing/popen_fork.py", line 73, in _launch
    code = process_obj._bootstrap()
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 350, in __call__
    return self.func(*args, **kwargs)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/joblib/parallel.py", line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/joblib/parallel.py", line 131, in <listcomp>
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/wgd-1.1-py3.6.egg/wgd/ks_distribution.py", line 289, in analyse_family
    os.path.basename(msa_path), preserve=preserve, times=times)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/wgd-1.1-py3.6.egg/wgd/codeml.py", line 312, in run_codeml
    d, likelihood = _parse_codeml_out(self.out_file)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/wgd-1.1-py3.6.egg/wgd/codeml.py", line 127, in _parse_codeml_out
    logging.warning("No \u03c9 value for {0} - {1}!".format(gene_1, gene_2))
Message: 'No \u03c9 value for Maker00014056 - Maker00020689!'

my input file is fasta file and gff file like that:

>Maker00026579
ATGGCCACAGGAAAGCGGAAACTCACATTCATAGCCAACGATTCTCAAAG
AAAAACAGTATGCAAGAAAAGGAAGCAGTCACTGCTGAAGAAAACGGAGG
AACTCAGCACCCTTTGTGGCGTTGAAGCATGTGCTATAGTTTATGGCCCC
AATGATCATCGGCCAGAGATCTGGCCATCTGAATCGGGTGTCAAAAATGT
ACTGGGAAAGTTCATGAACAAGCCACAATGGGAGCAAAGCAAAAAGATGA
TGAACCAAGAGAGTTTCATTGCACAAAGTATCATGAAGAGTAAAGACAAG
TTACAGAAAGTTGTGAAGGAAAACAAGGAGATTGAAATGTCCTTGTTCAT
GGCTCAGTGCTTTCAGACAGGTATGTTTCAGCCTGATATCAATATGACCG
CAGCTGATATGAATGTTCTTTCATCGGAGATTGAACAGAACCTGAAGGAC
ATTGATAAAAGGATGGAAATGCTGAAAGCCAACCAGGTGACACCAAACCA
ACCCGATATTGAATCGTCAACATTCCAACCCCAGATAATGCAAACATCAG
CATTCCAACCCCAGATTCAAATACCAGCATTCGAAACCCAGATCCAAACA
CAAACATACCAATCCCAGATGGAAACACCAACATTTCAACCCCAGATGCA
ATCACCAGCATTATTCCAACCCCAGATACAAACTGCATCATACCAACCCC
ATATGCAAACACAGTCATACCATCCCCATATGCAAGCACCATCATTCCCA

and

 HiC_scaffold_3  maker   mRNA    52904539        52906776        .       -       .       ID=Maker00000001;_AED=0.83;_eAED=0.85;_QI=0|0|0|0.5|0|0|2|0|95;
HiC_scaffold_3  maker   CDS     52906635        52906776        .       -       0       Parent=Maker00000001;
HiC_scaffold_3  maker   CDS     52904539        52904684        .       -       2       Parent=Maker00000001;
HiC_scaffold_3  maker   mRNA    52889299        52891610        .       -       .       ID=Maker00000002;_AED=0.81;_eAED=1.00;_QI=0|0|0|0.5|0|0.5|2|0|82;
HiC_scaffold_3  maker   CDS     52891518        52891610        .       -       0       Parent=Maker00000002;
HiC_scaffold_3  maker   CDS     52889299        52889454        .       -       0       Parent=Maker00000002;
HiC_scaffold_3  maker   mRNA    52850941        52853577        .       -       .       ID=Maker00000003;_AED=0.56;_eAED=0.66;_QI=0|0|0|1|0|0|2|0|128;
HiC_scaffold_3  maker   CDS     52853434        52853577        .       -       0       Parent=Maker00000003;
HiC_scaffold_3  maker   CDS     52850941        52851183        .       -       0       Parent=Maker00000003;
HiC_scaffold_3  maker   mRNA    52876283        52881803        .       +       .       ID=Maker00000004;_AED=0.87;_eAED=1.00;_QI=0|0|0|0.5|0|0|2|453|51;

could you help me fix it ? thank you very much.

arzwa commented 3 years ago

Hi, this seems to be due to a character encoding error, basically wgd tries to print an ω to the stdout or stderr stream but it doesn't work on your system... I should probably refactor the code to only use ascii characters, in the mean time, you could try the solutions listed here.

shiyi-pan commented 3 years ago

thank you for your reply . I have set PYTHONIOENCODING=windows-1252 in my shell but it don't work.

arzwa commented 3 years ago

Hi, I think you might have misinterpreted that stackoverflow answer, I think you might have to set the following environment variables to allow unicode characters:

set PYTHONIOENCODING=utf-8
set PYTHONLEGACYWINDOWSSTDIO=utf-8

But I'm not sure.

arzwa commented 3 years ago

Hi, I changed the code (see here). If you pulll the latest wgd version from github (master branch), remove your old installation and reinstall the new version you should not have this issue anymore I believe.

shiyi-pan commented 3 years ago

Thank you for your help. It works . I get another error when run the syn module:

2020-10-11 21:07:05: INFO i-adhore stderr: Error opening the settings file: -version 2020-10-11 21:07:05: WARNING Output directory already exists, will possibly overwrite 2020-10-11 21:07:05: INFO Parsing GFF file 2020-10-11 21:07:06: INFO Writing gene lists 2020-10-11 21:07:06: INFO Writing families file 2020-10-11 21:07:06: INFO Writing configuration file 2020-10-11 21:07:06: INFO Running I-ADHoRe 3.0 2020-10-11 21:07:06: WARNING ERROR: Genelist files not found in settings file

2020-10-11 21:07:06: INFO This is i-ADHoRe v3.0. Copyright (c) 2002-2010, Flanders Interuniversity Institute for Biotechnology, VIB. Algorithm designed by Klaas Vandepoele, Cedric Simillion, Jan Fostier, Dieter De Witte, Koen Janssens, Sebastian Proost, Yvan Saeys and Yves Van de Peer.

Process 1/1 is alive on compute-0-0.local.

Traceback (most recent call last): File "/ds3512/home/panyp/ruanjian/python3/bin/wgd", line 11, in load_entry_point('wgd==1.2', 'console_scripts', 'wgd')() File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/click/core.py", line 764, in call return self.main(args, kwargs) File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, ctx.params) File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/click/core.py", line 555, in invoke return callback(args, kwargs) File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/wgd-1.2-py3.6.egg/wgd_cli.py", line 857, in syn File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/wgd-1.2-py3.6.egg/wgdcli.py", line 944, in syn File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/pandas-0.24.1-py3.6-linux-x86_64.egg/pandas/io/parsers.py", line 702, in parser_f return _read(filepath_or_buffer, kwds) File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/pandas-0.24.1-py3.6-linux-x86_64.egg/pandas/io/parsers.py", line 429, in _read parser = TextFileReader(filepath_or_buffer, kwds) File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/pandas-0.24.1-py3.6-linux-x86_64.egg/pandas/io/parsers.py", line 895, in init self._make_engine(self.engine) File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/pandas-0.24.1-py3.6-linux-x86_64.egg/pandas/io/parsers.py", line 1122, in _make_engin e self._engine = CParserWrapper(self.f, self.options) File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/pandas-0.24.1-py3.6-linux-x86_64.egg/pandas/io/parsers.py", line 1853, in init self._reader = parsers.TextReader(src, kwds) File "pandas/_libs/parsers.pyx", line 387, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 705, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] File b'./wgd_syn/i-adhore-out/multiplicons.txt' does not exist: b'./wgd_syn/i-adhore-out/multiplicons.txt'

pengshf commented 2 years ago

@shiyi-pan can you tell me how to solve this problem, I encounter this problem too,thank you very much.

arzwa commented 2 years ago

It seems that the output expected from I-ADHoRe is not found, and that this is due to some error in the configuration of I-ADHoRe. Make sure your that you set the --feature (e.g. CDS) and --attribute (e.g. ID) settings in the wgd command such that the relevant features from the gff matches the gene IDs in the gene families file.