churchmanlab / genewalk

GeneWalk identifies relevant gene functions for a biological context using network representation learning
https://churchman.med.harvard.edu/genewalk
BSD 2-Clause "Simplified" License
127 stars 14 forks source link

Mouse ids are not working with genewalk? #48

Closed Anara2018 closed 3 years ago

Anara2018 commented 3 years ago

Hello,

I have tried to run some mouse gene list (from my differentially expressed data) with mouse_entrez ids (around 250 genes). Even though, on the axises of Regulator & Moonlight genes plots, I got "Number of GO annotations per gene" on X axis, "the fraction of relevant GO terms" was 0 and "Connections with other genes" was also 0. I was wondering if my entered format of mouse_entrez ids is not correct, or if, there are just no GO terms associated with these genes (from the human orthologs that Genewalk uses). Please let me know also, if the format of my entrez_mouse ids is not correct (I have the list of all my genes in GeneSymbol format before i use their entrez_ids for genewalk): The command i run for genewalk is:

genewalk --project genewalkspermlongRNAseq --genes mouse_entrez_ids_list.txt --id_type mouse_entrez --nproc 8 I have included several files with this issue

  1. folder with the plots that i received from genewalk (the plots of Regulator genes & Moonlight genes)
  2. my raw list of mouse_entrez genes (as a zipped file, but it's basically a .csv file)
  3. The error list that i receive sperm_final_downregulated_logFCnegative2-5_entrez_mouse.zip sperm_final_downregulated_logFC_negative2.5.csv.zip

INFO: [2021-03-26 14:58:49] genewalk.cli - Creating sperm_downregulated_entrez_mouse_26032021_anara folder at /home/anara/genewalk/sperm_downregulated_entrez_mouse_26032021_anara INFO: [2021-03-26 14:58:49] genewalk.resources - Using /home/anara/genewalk/resources as resource folder. WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not get HGNC ID for MGI ID 3608415 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not find an MGI mapping for Entrez ID AC102224.2 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not get HGNC ID for MGI ID 2142174 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not find an MGI mapping for Entrez ID AC124977.2 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not find an MGI mapping for Entrez ID AC133523.2 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not find an MGI mapping for Entrez ID AC133902.3 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not get HGNC ID for MGI ID 2141341 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not find an MGI mapping for Entrez ID AC138299.1 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not find an MGI mapping for Entrez ID AC140364.2 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not get HGNC ID for MGI ID 3796981 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not find an MGI mapping for Entrez ID AC158352.1 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not find an MGI mapping for Entrez ID AC161409.5 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not find an MGI mapping for Entrez ID AC164544.5 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not find an MGI mapping for Entrez ID Astx2 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not find an MGI mapping for Entrez ID Atcayos WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not find an MGI mapping for Entrez ID CH25-501L8.4 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not find an MGI mapping for Entrez ID CH36-169F23.5 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not get HGNC ID for MGI ID 107303 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not find an MGI mapping for Entrez ID Dlx1as WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not find an MGI mapping for Entrez ID Gm10217 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not find an MGI mapping for Entrez ID Gm17571 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not find an MGI mapping for Entrez ID Gm22690 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not find an MGI mapping for Entrez ID Gm26381 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not find an MGI mapping for Entrez ID Gm26545 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not get HGNC ID for MGI ID 3646599 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not find an MGI mapping for Entrez ID Kat6b-ps1 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not find an MGI mapping for Entrez ID Kif22-ps WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not find an MGI mapping for Entrez ID Lincmd1 WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not find an MGI mapping for Entrez ID LINE/L1? WARNING: [2021-03-26 14:58:49] genewalk.gene_lists - Could not get HGNC ID for MGI ID 1888480

ri23 commented 3 years ago

Hi @Anara2018 After checking the files you provided: it seems GW is running fine. It seems your mouse entrez input list contains a lot of IDs that are not actually entrez IDs. Many are just the mouse gene symbols, I suspect because no mouse entrez id was found for that gene by your previous DE analysis according to sperm_final_downregulated_logFC_negative2.5.csv.

As a result many genes get filtered out by GW as shown in the log file. The genes that are included in the GW analysis are very sparsely connected in the GW network and unfortunately do not have any significant GO terms. That is why your moonlighting and regulator plots are empty.

If I were you I would filter to only include for protein coding genes before running the DE analysis. This would give you extra statistical power for the DE analysis and generate DE gene lists with only protein coding genes for which entrez id should be found. Then rerun GW with that. GW will in any case not be able to make predictions for pseudo or predicted genes (eg. gm17571) that are currently filtered out by GW. I you don't get more DE genes, I would then consider an FDR of 0.1 instead of 0.05 to get more DE genes.

Good luck! Robert