churchmanlab / genewalk

GeneWalk identifies relevant gene functions for a biological context using network representation learning
https://churchman.med.harvard.edu/genewalk
BSD 2-Clause "Simplified" License
127 stars 14 forks source link

Installation failed. #28

Closed alhafidzhamdan closed 3 years ago

alhafidzhamdan commented 3 years ago

Hi there,

I created a fresh conda environment with conda create -n genewalk python=3.5

and installed genewalk using pip install git+https://github.com/churchmanlab/genewalk.git

but genewalk -hgave me this error:

Traceback (most recent call last):
  File "/exports/igmm/eddie/Glioblastoma-WGS/anaconda/envs/genewalk/bin/genewalk", line 5, in <module>
    from genewalk.cli import main
  File "/exports/igmm/eddie/Glioblastoma-WGS/anaconda/envs/genewalk/lib/python3.5/site-packages/genewalk/cli.py", line 11, in <module>
    from genewalk.nx_mg_assembler import load_network
  File "/exports/igmm/eddie/Glioblastoma-WGS/anaconda/envs/genewalk/lib/python3.5/site-packages/genewalk/nx_mg_assembler.py", line 6, in <module>
    from indra.databases import go_client
  File "/exports/igmm/eddie/Glioblastoma-WGS/anaconda/envs/genewalk/lib/python3.5/site-packages/indra/databases/__init__.py", line 7, in <module>
    from .identifiers import get_identifiers_url, parse_identifiers_url, \
  File "/exports/igmm/eddie/Glioblastoma-WGS/anaconda/envs/genewalk/lib/python3.5/site-packages/indra/databases/identifiers.py", line 302
    if not db_id.startswith(f'{db_ns}{colon}'):
                                            ^
SyntaxError: invalid syntax

Could you help me troubleshoot please?

A

bgyori commented 3 years ago

From the error message, it looks like you are using Python 3.5. GeneWalk works with Python 3.6+ so you'll have to get a newer version of Python.

alhafidzhamdan commented 3 years ago

Thanks- that's running now. FYI i tried random_seed 9999 and that caused the run to stall so I've just removed it and it seems to be running ok.

ri23 commented 3 years ago

Thank you @alhafidzhamdan for notifying us about the random seed 9999. Does GeneWalk work for you without issues when inputting another random seed value or is this error specific to value 9999?

alhafidzhamdan commented 3 years ago

Hi @ri23,

I've re-run with seed 1234 and it worked. Then i ran it again with seed 9999 and it again worked. Not sure what i did differently there- it might be something unrelated.

Btw, can you advise me as to how to re-create the gene to Go term network figure? And is there a way to generate multiple genes to a single Go term?

image

Many thanks in advance. A

ri23 commented 3 years ago

Hi @alhafidzhamdan

Thanks for the clarification on the random seeds. Below a python script to recreate the subgraph around 3 chosen genes (Mal, Pllp and Plp1). The script should be relatively easy to adjust if you want choose a GO term (use its go_id from genewalk_results.csv file) and visualize the connected genes instead of the subnetwork for the above chosen genes. See section "Data preprocessing" below.

Best, Robert

#!/usr/bin/env python
# coding: utf-8

# # GeneWalk network visualization

import os
import re
import copy
import pickle as pkl
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
plt.rcParams['pdf.fonttype'] = 42

# ### Load GeneWalk multigraph and results

path = '/home/genewalk/qki/'   

filename = 'multi_graph.pkl'
with open(os.path.join(path,filename), 'rb') as f:
    MG = pkl.load(f)

filename = 'genewalk_results.csv'
GW = pd.read_csv(os.path.join(path,filename)) 

# ## QKI subnetwork visualization

# Data preprocessing

# Choose genes of interest
GENES=['MAL','PLLP','PLP1']
labels = {}   
for node in GENES:
    labels[node] = node

#Genes and Neighors
#NB = neighbors
GENES_NB=copy.deepcopy(GENES)
for source in GENES:
    GENES_NB.extend(list(MG.neighbors(source)))
    print(source, len(GENES_NB))
GENES_NB=sorted(list(set(GENES_NB)))

#Subset of Neighbors that are genes
GENES_only_NB=copy.deepcopy(GENES_NB)
for gene in GENES_NB:
    if re.search('GO:',gene):
        GENES_only_NB.remove(gene)

#Enumerate GO annotations of Mal according to GeneWalk ranking
gene = 'MAL'
MAL_GO_NB = list(GW[GW['hgnc_symbol']==gene].sort_values(by='global_padj')['go_id'])
labels_MAL_GO_NB = dict()
for i in range(len(MAL_GO_NB)):
    labels_MAL_GO_NB[MAL_GO_NB[i]] = str(i+1)
MAL_GO_NB_edges = [(gene,gonode) for gonode in MAL_GO_NB]

# ### Generate SubGraph (for plotting)
SG = MG.subgraph(GENES_NB)

SGplot = nx.OrderedGraph() 
SGplot.add_nodes_from(GENES_NB) 
SGplot.add_edges_from((u, v) for (u, v) in SG.edges() if u in SGplot if v in SGplot)

# ### Generate plot
plt.figure(figsize=(4,4))#units: inch

pos = nx.circular_layout(SGplot)

nx.draw(SGplot, pos=pos, node_color='white', with_labels=False, alpha=0.1, node_size=150)
nx.draw_networkx_nodes(SGplot, pos, nodelist=GENES_NB, node_size=150, node_color='#B82225', alpha=1)
nx.draw_networkx_nodes(SGplot, pos, nodelist=GENES_only_NB, node_size=150, node_color='#007EC3', alpha=1)
nx.draw_networkx_edges(SGplot, pos, edgelist=MAL_GO_NB_edges, width=2.0)#draw thick edges for Mal GO annotations

scale_factor = 1.05
posl = copy.deepcopy(pos)
for node in posl:
    posl[node] = scale_factor * pos[node]

#Add labels to the nodes you require
lab = nx.draw_networkx_labels(SGplot, pos=posl, labels=labels, font_size=8,font_weight='bold')
lab = nx.draw_networkx_labels(SGplot, pos=pos, labels=labels_MAL_GO_NB, font_size=10, 
                              font_color='white',font_weight='bold')

filename = 'subnetwork_circular'
plt.savefig(os.path.join(path, filename + '.pdf'),bbox_inches="tight",transparent=True)
plt.savefig(os.path.join(path, filename + '.png'),bbox_inches="tight",transparent=True)
alhafidzhamdan commented 3 years ago

Fantastic thanks!