churchmanlab / genewalk

GeneWalk identifies relevant gene functions for a biological context using network representation learning
https://churchman.med.harvard.edu/genewalk
BSD 2-Clause "Simplified" License
127 stars 14 forks source link

GO Terms #45

Closed npokorzynski closed 3 years ago

npokorzynski commented 3 years ago

Hi,

Not an issue but more of a question - is it possible to restrict the GO terms utilized in Genewalk to only those of a specific category (e.g. biological process, etc.)? I'm curious if I can exclude GO terms I'm not particularly interested in (cellular component, for example) and derive more meaningful, significant GO term associations for identified regulators. I surveyed the options in genewalk --help but it didn't seem like any of the commands could be used to modify the GO terms.

Thanks, Nick

ri23 commented 3 years ago

Hi @npokorzynski Thanks for reaching out again. The simplest solution would be to run GeneWalk. Then in the genewalk_results.csv file delete all the rows corresponding to GO term you don't want, for instance filter out cellular component term as per the column go_domain. Then save that genewalk_results.csv file and rerun GW but only the --stage visual. Now the bar plots and regulator / moonlighting gene scatter plots are generated based on the remaining gene-GO terms pairs from the results file.

Note that the strategy above does not redo the statistics / FDR corrections. If you want to filter out GO terms before statistical testing (and perhaps get stronger significance for some of the remaining GO terms), you could git clone the repo and adjust the source code to exclude GO terms based on CC domain in perform_statistics.py . And then run GW from that cloned/adjusted code by calling python cli.py followed by all the usual arguments.

Good luck! Robert