churchmanlab / genewalk

GeneWalk identifies relevant gene functions for a biological context using network representation learning
https://churchman.med.harvard.edu/genewalk
BSD 2-Clause "Simplified" License
127 stars 14 forks source link

Enable GeneWalk to run on non-human-mapped gene IDs and networks #41

Closed bgyori closed 3 years ago

bgyori commented 3 years ago

This PR adds a new --id_type option called custom, which allows using a gene list input file that contains IDs or names in some arbitrary name space. In this case, GeneWalk does not attempt to map the input genes to corresponding human genes. Example gene list:

ABC
XYZ
...

When using --id_type custom, the user also has to use the --network_source sif_annot or --network_source sif_full arguments and supply an appropriate network in the --network_file argument. In the case of sif_annot, the user has to provide a set of gene-gene edges and gene-GO edges that represent relations between the input genes, and GO annotations for them. Example SIF input:

ABC,rel,XYZ
ABC,annot,GO:0000009
ABC,annot,GO:2001310
XYZ,annot:GO:0003406
...

In the case of the sif_full option, the network_file must also contain all GO-GO edges (i.e., the basic structure of GO).

Overall, this enables using GeneWalk on a list of genes from any organism, as long as an input network of gene-gene relations, and GO annotations for those genes are also provided by the user. This information can be collected from organism-specific resources that are outside the scope of GeneWalk.