SunPengChuan / wgdi

WGDI: A user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes
https://wgdi.readthedocs.io/en/latest/
BSD 2-Clause "Simplified" License
114 stars 22 forks source link

KeyError: "None of [Int64Index ... in wgdi -c #43

Open amvarani opened 9 months ago

amvarani commented 9 months ago

Hi there, I'm facing the error below when running the "wgdi -c " command I'm using Ptrichocarpa from Phytozome

blockinfo = Ptrichocarpa_Ptrichocarpa.blockinfo.csv lens1 = Ptrichocarpa.lens lens2 = Ptrichocarpa.lens tandem = false tandem_length = 200 pvalue = 0.2 block_length = 5 tandem_ratio = 0.5 multiple = 1 homo = -1,1 savefile = Ptrichocarpa_Ptrichocarpa.blockinfo.new.csv Traceback (most recent call last): File "/home/amvarani/.local/bin/wgdi", line 8, in sys.exit(main()) File "/home/amvarani/.local/lib/python3.10/site-packages/wgdi/run.py", line 163, in main module_to_run(arg, value) File "/home/amvarani/.local/lib/python3.10/site-packages/wgdi/run.py", line 122, in module_to_run run_subprogram(program, conf, name) File "/home/amvarani/.local/lib/python3.10/site-packages/wgdi/run.py", line 87, in run_subprogram r.run() File "/home/amvarani/.local/lib/python3.10/site-packages/wgdi/block_correspondence.py", line 47, in run arr = self.collinearity_region(cor, bkinfo, lens1) File "/home/amvarani/.local/lib/python3.10/site-packages/wgdi/block_correspondence.py", line 70, in collinearity_region df1[[int(k) for k in b1]] += 1 File "/home/amvarani/.local/lib/python3.10/site-packages/pandas/core/series.py", line 1007, in getitem return self._get_with(key) File "/home/amvarani/.local/lib/python3.10/site-packages/pandas/core/series.py", line 1042, in _get_with return self.loc[key] File "/home/amvarani/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 1073, in getitem return self._getitem_axis(maybe_callable, axis=axis) File "/home/amvarani/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 1301, in _getitem_axis return self._getitem_iterable(key, axis=axis) File "/home/amvarani/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 1239, in _getitem_iterable keyarr, indexer = self._get_listlike_indexer(key, axis) File "/home/amvarani/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 1432, in _get_listlike_indexer keyarr, indexer = ax._get_indexer_strict(key, axis_name) File "/home/amvarani/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6113, in _get_indexer_strict self._raise_if_missing(keyarr, indexer, axis_name) File "/home/amvarani/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6173, in _raise_if_missing raise KeyError(f"None of [{key}] are in the [{axis_name}]") KeyError: "None of [Int64Index([8462, 8465, 8469, 8477, 8481, 8484, 8502, 8503, 8508, 8517, 8520,\n 8534, 8537, 8541, 8545, 8551, 8554, 8558, 8572, 8585, 8591],\n dtype='int64')] are in the [index]"

SunPengChuan commented 9 months ago

The problem with your gff and lens file processing is that the location of a certain gene is likely to exceed the scope of the chromosome.

SunPengChuan commented 9 months ago

You can upload your dataset and I can check it for you.

amvarani commented 9 months ago

Dear @SunPengChuan Thanks for the quick answer Here are the files that I've used

Ptrichocarpa_input_file.zip blastp.zip

And the Phytozome files:

Ptrichocarpa_533_v4.1.cds_primaryTranscriptOnly.fa.gz Ptrichocarpa_533_v4.1.gene.gff3.gz Ptrichocarpa_533_v4.1.protein_primaryTranscriptOnly.fa.gz

SunPengChuan commented 9 months ago

In a GFF file, the fourth column, which represents the start of each chromosome, always begins with the number 1. There are numerous genes with alternative splicing in your GFF file that need to be eliminated.

amvarani commented 9 months ago

Hi @SunPengChuan Thanks! I've identified the issue in my GFF file. For those interested in converting Phytozome GFF3 files, here is a simple AWK script that can accomplish this:

zcat $genome.gff3.gz | awk '{if ($3 == "gene" ) print $1,$4,$5,$7,$9}' | cut -f 1 -d";" | sort -V | sed 's#.#_#g' | sed 's#_v41##g' | awk '{split($5, a, "[=.]"); if (last != $1) {counter = 1; last = $1} else {counter++} print $1 " " a[2] " " $2 " " $3 " " $4 " " counter " ID" a[2]}'

kashiff007 commented 9 months ago

Hi @SunPengChuan

I am getting the similar error while running BlockInfo. Error

(base) [nawazk@login509-02-l W6-48549-006]$ wgdi -bi bi_total.conf
blast  =  SG_A_vs_SB_B.blast
gff1  =  SG_A.gff
gff2  =  SG_B.gff
lens1  =  SG_A.lens
lens2  =  SG_B.lens
collinearity  =  SG_A_vs_SB_B.list
score  =  100
evalue  =  1e-5
repeat_number  =  20
position  =  order
ks  =  ks file
ks_col  =  ks_NG86
savefile  =  block information (*.csv)
/home/nawazk/.conda/envs/mamba/lib/python3.10/site-packages/wgdi/block_info.py:74: FutureWarning: In a future version of pandas, a length 1 tuple will be returned when iterating over a groupby with a grouper equal to a list of length 1. Don't supply a list with a single grouper to avoid this warning.
  index = [group.sort_values(by=11, ascending=False)[:repeat_number].index.tolist()
Traceback (most recent call last):
  File "/home/nawazk/.conda/envs/mamba/bin/wgdi", line 10, in <module>
    sys.exit(main())
  File "/home/nawazk/.conda/envs/mamba/lib/python3.10/site-packages/wgdi/run.py", line 163, in main
    module_to_run(arg, value)
  File "/home/nawazk/.conda/envs/mamba/lib/python3.10/site-packages/wgdi/run.py", line 122, in module_to_run
    run_subprogram(program, conf, name)
  File "/home/nawazk/.conda/envs/mamba/lib/python3.10/site-packages/wgdi/run.py", line 87, in run_subprogram
    r.run()
  File "/home/nawazk/.conda/envs/mamba/lib/python3.10/site-packages/wgdi/block_info.py", line 121, in run
    collinearity = self.auto_file(gff1, gff2)
  File "/home/nawazk/.conda/envs/mamba/lib/python3.10/site-packages/wgdi/block_info.py", line 164, in auto_file
    return collinearity
UnboundLocalError: local variable 'collinearity' referenced before assignment

Can you check the files attached? SG_A_vs_SB_B.blast.txt SG_A_vs_SB_B.collinearit_pair.txt SG_A.gff.txt SG_A.lens.txt SG_B.gff.txt SG_B.lens.txt

SunPengChuan commented 9 months ago

collinearity = SG_A_vs_SB_B.list This file is the result of the -c Subprogram of WGDI, and it could also be the output of MCScanX or JCVI. It’s not a gene pair.

savefile = block information (*.csv) The savefile has not been modified; it is an output file.