churchmanlab / genewalk

GeneWalk identifies relevant gene functions for a biological context using network representation learning
https://churchman.med.harvard.edu/genewalk
BSD 2-Clause "Simplified" License
127 stars 14 forks source link

network source file #30

Closed amitpande74 closed 3 years ago

amitpande74 commented 3 years ago

Hi, I ran Genewalk using the following command : `genewalk --project context1 --genes /home/amit/genewalk/gene_list_DE_ER_UBT.txt --id_type hgnc_id --stage all --base_folder /home/amit/genewalk/chigozie/ --network_source /home/amit/genewalk/chigozie/resources/PathwayCommons12.All.hgnc_current.sif --nproc 6

but it gave me an error: genewalk: error: argument --network_source: invalid choice: '/home/amit/genewalk/chigozie/resources/PathwayCommons12.All.hgnc_current.sif' (choose from 'pc', 'indra', 'edge_list', 'sif')

I looked into the command argument and found these: --network_source {pc,indra,edge_list,sif} The source of the network to be used.Possible values are: pc, indra, edge_list, and sif. In case of indra, edge_list, and sif, the network_file argument must be specified. Default: pc --network_file NETWORK_FILE If network_source is indra, this argument points to a Python pickle file in which a list of INDRA Statements constituting the network is contained. In case network_source is edge_list or sif, the network_file argument points to a text file representing the network. Can you kindly help in terms of the source of these files or whether the user has to supply them.

regards, Amit.

bgyori commented 3 years ago

Hi @amitpande74, if you are using the PathwayCommons network, you should either omit the --network_source argument, since that is the one used by default, or use --network_source pc. If you want to supply a custom SIF file as input, use --network_source sif --network_file my_network.sif.

amitpande74 commented 3 years ago

Dear @bgyori , Than you so much for the information. One more thing, since the description on github is scanty regarding the handling of multi_graph.pkl file, Could you kindly elaborate on this please. What is mentioned in the tutorial page is : TIP (optional): to perform a connectivity analysis as described in our publication programmatically: the GeneWalk network (networkx format) itself is also output as a multi_graph.pkl file (pickle binary format), which can be loaded into Python for further analysis.More details on the output can be found on our GitHub page. Kindly help, I am new to network analysis. regards.

ri23 commented 3 years ago

Hi @amitpande74 Regarding the multi_graph.pkl: You can load it into python as follows:

import networkx as nx
import os
folder = '/home/amit/genewalk/chigozie/context1/'
fmg = 'multi_graph.pkl'
with open(os.path.join(folder,fmg), 'rb') as f:
       MG = pkl.load(f)

The GeneWalk Network, loaded as MG above, is a networkx.MultiGraph object, documentation here: https://networkx.org/documentation/stable/reference/classes/multigraph.html Intuitively, this networkx.MultiGraph is a little bit similar to a Python dictionary The genes and GO terms are the nodes in the network. You can see all the nodes listed by running MG.nodes If you have a gene of interest from your input gene list 'MYC' (the genes are encoded as hgnc gene symbols in the network) or GO term ID ('GO:1234567'), then you can see all their connected neighboring nodes through: MG['MYC'] and MG['GO:1234567'] The edge labels tell you the type of edges in the network as retrieved from Pathway Commons and GO ontology/annotation files. Hope this helps! Robert

amitpande74 commented 3 years ago

Dear @ri23 ,

Thank you so much. Shall try it and keep you posted if I seek some solutions. regards.