CarineRey / pcoc

Convergent substitution detection tool based on the PCOC model
GNU General Public License v3.0
30 stars 16 forks source link

new version pandas.io.common.EmptyDataError: No columns to parse from file #28

Open mariemorel opened 2 years ago

mariemorel commented 2 years ago

Dear Carine,

I have recently updated pcoc with the new version and since then, I cannot run some of my datasets anymore. The sedges c3c4 dataset from Besnard et al still runs but I have a rhodopsin dataset that used to run with the previous version and now I get : Command error:

  Rate 1
  Distribution...........................: Gamma
  Number of classes......................: 4
  WARNING!!! Parameter Gamma.alpha not specified. Default used instead: 1
  Parameter found........................: Gamma.alpha=1
  - Category 0 (Pr = 0.25) rate..........: 0.136954
  - Category 1 (Pr = 0.25) rate..........: 0.476752
  - Category 2 (Pr = 0.25) rate..........: 1
  - Category 3 (Pr = 0.25) rate..........: 2.38629

  Model 1
  Transition model.......................: LGL08_CAT
  Substitution model.....................: LGL08_CAT_C2
  External model frequencies init........: None

  Model 2
  Transition model.......................: LGL08_CAT
  Substitution model.....................: LGL08_CAT_C10
  External model frequencies init........: None

  Root Frequencies Set 1

  Process 1
  Process type...........................: NonHomogeneous
   Model number1 associated to...........: 1978 node(s).
   Model number2 associated to...........: 2114 node(s).
   Tree number...........................: 1
   Rate number...........................: 1
   Root frequencies number...............: 1

  Phylolikelihood 1
   Data used ............................: 1
   Process ..............................: 1

  Result Phylolikelihood
   Result................................: phylo1

  Killed
  2022-02-03 14:19:03,893 - ERROR - No site retained for lessgappy.align_reroot.fa (too many gaps), you can use the "--max_gap_allowed" option.
  Traceback (most recent call last):
    File "/usr/local/bin/pcoc_det.py", line 703, in <module>
      mk_detect(tree_filename, ali_basename, OutDirName)
    File "/usr/local/bin/pcoc_det.py", line 512, in mk_detect
      df_res_l = pool.map(make_estim, set_e1e2)
    File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
      return self.map_async(func, iterable, chunksize).get()
    File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
      raise self._value
  pandas.io.common.EmptyDataError: No columns to parse from file

my command :

S=`cat scenario_pcoc_noacr.txt` 
pcoc_det.py -t tree-rerooted.nhx.txt -aa lessgappy.align_reroot.fa.txt -cpu 4 -o output_pcoc -m $S -f 0.8 -f_oc 0.5 --max_gap_allowed 20 -est_profiles C10

I have tried to discard as many gaps as possible but I don't think it is the issue here. I analyzed my fasta file and the tree and I really cannot see the problem.

In the meanwhile, I am using the old version.

Thank you a lot for your answer and your work!

lessgappy.align_reroot.fa.txt scenario_pcoc_noacr.txt tree_rerooted.nhx.txt