Closed chrarnold closed 4 years ago
Yes that is a bug, we'll fix this. Thanks.
Hi @chrarnold, the fix can be tested quickly by running genewalk with argument --stage statistics Run time will only be a few minutes as the previous stages have already completed fine in your last run.
This worked yes! I changed the modified file manually on my machine, and it run through just fine. I think this can and should be merged, and this issue can be closed.
Hi, I tried running the newest version with Ensembl IDs, and after around 1 hour of running time using 20 cores this is what I get, which looks like a bug to me:
` ... INFO: [2019-09-23 17:22:32] gensim.models.base_any2vec - worker thread finished; awaiting finish of 2 more threads INFO: [2019-09-23 17:22:32] gensim.models.base_any2vec - worker thread finished; awaiting finish of 1 more threads INFO: [2019-09-23 17:22:32] gensim.models.base_any2vec - worker thread finished; awaiting finish of 0 more threads INFO: [2019-09-23 17:22:32] gensim.models.base_any2vec - EPOCH - 5 : training on 164576000 raw words (164576000 effective words) took 119.9s, 1372717 effective words/s INFO: [2019-09-23 17:22:32] gensim.models.base_any2vec - training on a 822880000 raw words (822880000 effective words) took 566.7s, 1452084 effective words/s INFO: [2019-09-23 17:22:32] genewalk.deepwalk - Generating node vectors done in 610.30s INFO: [2019-09-23 17:22:33] genewalk.cli - Saving into /home/carnold/genewalk/cll_test/deepwalk_node_vectors_rand_3.pkl... INFO: [2019-09-23 17:22:41] genewalk.cli - Saving into /home/carnold/genewalk/cll_test/genewalk_rand_simdists.pkl... INFO: [2019-09-23 17:22:41] genewalk.cli - Loading /home/carnold/genewalk/cll_test/multi_graph.pkl... INFO: [2019-09-23 17:22:41] genewalk.cli - Loading /home/carnold/genewalk/cll_test/genes.pkl... INFO: [2019-09-23 17:22:41] genewalk.cli - Loading /home/carnold/genewalk/cll_test/deepwalk_node_vectors_1.pkl... INFO: [2019-09-23 17:22:42] genewalk.cli - Loading /home/carnold/genewalk/cll_test/deepwalk_node_vectors_2.pkl... INFO: [2019-09-23 17:22:42] genewalk.cli - Loading /home/carnold/genewalk/cll_test/deepwalk_node_vectors_3.pkl... INFO: [2019-09-23 17:22:42] genewalk.cli - Loading /home/carnold/genewalk/cll_test/genewalk_rand_simdists.pkl... Traceback (most recent call last): File "bla/TOOLS/miniconda/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'ensembl_id'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "bla/TOOLS/miniconda/bin/genewalk", line 10, in
sys.exit(main())
File "bla/TOOLS/miniconda/lib/python3.7/site-packages/genewalk/cli.py", line 203, in main
base_id_type=args.id_type)
File "bla/TOOLS/miniconda/lib/python3.7/site-packages/genewalk/perform_statistics.py", line 178, in generate_output
df[base_id_type] = df[base_id_type].astype('category')
File "bla/TOOLS/miniconda/lib/python3.7/site-packages/pandas/core/frame.py", line 2980, in getitem
indexer = self.columns.get_loc(key)
File "bla/TOOLS/miniconda/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'ensembl_id'
`