churchmanlab / genewalk

GeneWalk identifies relevant gene functions for a biological context using network representation learning
https://churchman.med.harvard.edu/genewalk
BSD 2-Clause "Simplified" License
127 stars 14 forks source link

KeyError: 'ensembl_id' #8

Closed chrarnold closed 4 years ago

chrarnold commented 4 years ago

Hi, I tried running the newest version with Ensembl IDs, and after around 1 hour of running time using 20 cores this is what I get, which looks like a bug to me:

` ... INFO: [2019-09-23 17:22:32] gensim.models.base_any2vec - worker thread finished; awaiting finish of 2 more threads INFO: [2019-09-23 17:22:32] gensim.models.base_any2vec - worker thread finished; awaiting finish of 1 more threads INFO: [2019-09-23 17:22:32] gensim.models.base_any2vec - worker thread finished; awaiting finish of 0 more threads INFO: [2019-09-23 17:22:32] gensim.models.base_any2vec - EPOCH - 5 : training on 164576000 raw words (164576000 effective words) took 119.9s, 1372717 effective words/s INFO: [2019-09-23 17:22:32] gensim.models.base_any2vec - training on a 822880000 raw words (822880000 effective words) took 566.7s, 1452084 effective words/s INFO: [2019-09-23 17:22:32] genewalk.deepwalk - Generating node vectors done in 610.30s INFO: [2019-09-23 17:22:33] genewalk.cli - Saving into /home/carnold/genewalk/cll_test/deepwalk_node_vectors_rand_3.pkl... INFO: [2019-09-23 17:22:41] genewalk.cli - Saving into /home/carnold/genewalk/cll_test/genewalk_rand_simdists.pkl... INFO: [2019-09-23 17:22:41] genewalk.cli - Loading /home/carnold/genewalk/cll_test/multi_graph.pkl... INFO: [2019-09-23 17:22:41] genewalk.cli - Loading /home/carnold/genewalk/cll_test/genes.pkl... INFO: [2019-09-23 17:22:41] genewalk.cli - Loading /home/carnold/genewalk/cll_test/deepwalk_node_vectors_1.pkl... INFO: [2019-09-23 17:22:42] genewalk.cli - Loading /home/carnold/genewalk/cll_test/deepwalk_node_vectors_2.pkl... INFO: [2019-09-23 17:22:42] genewalk.cli - Loading /home/carnold/genewalk/cll_test/deepwalk_node_vectors_3.pkl... INFO: [2019-09-23 17:22:42] genewalk.cli - Loading /home/carnold/genewalk/cll_test/genewalk_rand_simdists.pkl... Traceback (most recent call last): File "bla/TOOLS/miniconda/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'ensembl_id'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "bla/TOOLS/miniconda/bin/genewalk", line 10, in sys.exit(main()) File "bla/TOOLS/miniconda/lib/python3.7/site-packages/genewalk/cli.py", line 203, in main base_id_type=args.id_type) File "bla/TOOLS/miniconda/lib/python3.7/site-packages/genewalk/perform_statistics.py", line 178, in generate_output df[base_id_type] = df[base_id_type].astype('category') File "bla/TOOLS/miniconda/lib/python3.7/site-packages/pandas/core/frame.py", line 2980, in getitem indexer = self.columns.get_loc(key) File "bla/TOOLS/miniconda/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'ensembl_id'

`

ri23 commented 4 years ago

Yes that is a bug, we'll fix this. Thanks.

churchmanlab commented 4 years ago

Hi @chrarnold, the fix can be tested quickly by running genewalk with argument --stage statistics Run time will only be a few minutes as the previous stages have already completed fine in your last run.

chrarnold commented 4 years ago

This worked yes! I changed the modified file manually on my machine, and it run through just fine. I think this can and should be merged, and this issue can be closed.