edgraham / GhostKoalaParser

Parser for Ghost Koala
9 stars 5 forks source link

python pandas [KeyError: 3] #9

Open sihellem opened 4 years ago

sihellem commented 4 years ago

Dear Elaina,

I was following you tutorial on how to import KO annotations into anvi'o (http://merenlab.org/2018/01/17/importing-ghostkoala-annotations/).

I generated user_ko.txt from GhostKOALA, but am now stuck with KEGG-to-anvio script with a pandas error I am unaware how to properly solve. Here is my script and associated errors:

$ KEGG_DB=/home/s/Programs/GhostKoalaParser/KeggOrthology_Table1.txt
$ KEGG=/home/s/Programs/GhostKoalaParser/KEGG-to-anvio
$ KO=/work/anvio/user_ko.txt
$ python $KEGG --KeggDB $KEGG_DB -i $KO -o KeggAnnotations-AnviImportable.txt
Traceback (most recent call last):
  File "/home/s/Programs/GhostKoalaParser/KEGG-to-anvio", line 24, in <module>
    y =pd.DataFrame(x[3].str.split(' ',1).tolist(),columns=['accession','description'])
  File "/apps/free72/python/2.7.10/lib/python2.7/site-packages/pandas/core/frame.py", line 1997, in __getitem__
    return self._getitem_column(key)
  File "/apps/free72/python/2.7.10/lib/python2.7/site-packages/pandas/core/frame.py", line 2004, in _getitem_column
    return self._get_item_cache(key)
  File "/apps/free72/python/2.7.10/lib/python2.7/site-packages/pandas/core/generic.py", line 1350, in _get_item_cache
    values = self._data.get(item)
  File "/apps/free72/python/2.7.10/lib/python2.7/site-packages/pandas/core/internals.py", line 3290, in get
    loc = self.items.get_loc(item)
  File "/apps/free72/python/2.7.10/lib/python2.7/site-packages/pandas/indexes/base.py", line 1947, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)
  File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)
  File "pandas/hashtable.pyx", line 303, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6610)
  File "pandas/hashtable.pyx", line 309, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6554)
KeyError: 3

Thank you in advance for your reply and for your help, Simon

edgraham commented 4 years ago

Hi Simon,

I will take a look at this. I suspect though I need to update the parser for python 3 as it was never properly tested in that environment. So try running it calling python2.7 and see if that works.

-Elaina

On Jun 7, 2020, at 7:20 PM, sihellem notifications@github.com wrote:

 Dear Elaina,

I was following you tutorial on how to import KO annotations into anvi'o (http://merenlab.org/2018/01/17/importing-ghostkoala-annotations/).

I generated user_ko.txt from GhostKOALA, but am now stuck with KEGG-to-anvio script with a pandas error I am unaware how to properly solve. Here is my script and associated errors:

$ KEGG_DB=/home/s/Programs/GhostKoalaParser/KeggOrthology_Table1.txt $ KEGG=/home/s/Programs/GhostKoalaParser/KEGG-to-anvio $ KO=/work/anvio/user_ko.txt $ python $KEGG --KeggDB $KEGG_DB -i $KO -o KeggAnnotations-AnviImportable.txt Traceback (most recent call last): File "/home/s/Programs/GhostKoalaParser/KEGG-to-anvio", line 24, in y =pd.DataFrame(x[3].str.split(' ',1).tolist(),columns=['accession','description']) File "/apps/free72/python/2.7.10/lib/python2.7/site-packages/pandas/core/frame.py", line 1997, in getitem return self._getitem_column(key) File "/apps/free72/python/2.7.10/lib/python2.7/site-packages/pandas/core/frame.py", line 2004, in _getitem_column return self._get_item_cache(key) File "/apps/free72/python/2.7.10/lib/python2.7/site-packages/pandas/core/generic.py", line 1350, in _get_item_cache values = self._data.get(item) File "/apps/free72/python/2.7.10/lib/python2.7/site-packages/pandas/core/internals.py", line 3290, in get loc = self.items.get_loc(item) File "/apps/free72/python/2.7.10/lib/python2.7/site-packages/pandas/indexes/base.py", line 1947, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:4154) File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:4018) File "pandas/hashtable.pyx", line 303, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6610) File "pandas/hashtable.pyx", line 309, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6554) KeyError: 3 Thank you in advance for your reply and for your help, Simon

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

sihellem commented 4 years ago

Hey Elaina,

Thanks a lot for your answer! I just tried it again using python 2.7.0, but unfortunately obtain similar errors:

$ python $KEGG --KeggDB $KEGG_DB -i $KO -o KeggAnnotations-AnviImportable.txt
/home/s/Programs/GhostKoalaParser/KEGG-to-anvio:23: FutureWarning: read_table is deprecated, use read_csv instead, passing sep='\t'.
  x= pd.read_table(keggortho_database,header=None)
Traceback (most recent call last):
  File "/home/s/Programs/GhostKoalaParser/KEGG-to-anvio", line 24, in <module>
    y =pd.DataFrame(x[3].str.split(' ',1).tolist(),columns=['accession','description'])
  File "/home/s/pyenv/versions/2.7.0/lib/python2.7/site-packages/pandas/core/frame.py", line 2927, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/s/pyenv/versions/2.7.0/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 2659, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 987, in pandas._libs.hashtable.Int64HashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 993, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 3

Best regards, Simon