egeulgen / pathfindR

pathfindR: Enrichment Analysis Utilizing Active Subnetworks
https://egeulgen.github.io/pathfindR/
Other
178 stars 25 forks source link

error: 0 genes in experiment file does not exist in the network. #22

Closed yoonsquared closed 4 years ago

yoonsquared commented 4 years ago

Describe the bug No resulting html or output from the function run_pathfindR()

To Reproduce Steps to reproduce the behavior: Read in the dataset as dataframe. run_pathfindR(), options added or not added, no results are produced.

Expected behavior get html and pathway results.

Screenshots

WT3_pathfindR_phos_output <- run_pathfindR(WT3_pathfindR_sig_phos_table,human_genes=FALSE, adj_method="BH", output="./results/Pathway_WT3_phos") Oct 07, 2019 12:48:07 PM ActiveSubnetworkSearchMisc.ScoreCalculations fillNodeToPValueMap WARNING: 0 genes in experiment file does not exist in the network Oct 07, 2019 12:48:07 PM ActiveSubnetworkSearchMisc.ScoreCalculations fillNodeToPValueMap WARNING: 0 genes in experiment file does not exist in the network Oct 07, 2019 12:48:07 PM ActiveSubnetworkSearchMisc.ScoreCalculations fillNodeToPValueMap WARNING: 0 genes in experiment file does not exist in the network Oct 07, 2019 12:48:27 PM ActiveSubnetworkSearchMisc.ScoreCalculations fillNodeToPValueMap WARNING: 0 genes in experiment file does not exist in the network Oct 07, 2019 12:48:27 PM ActiveSubnetworkSearchMisc.ScoreCalculations fillNodeToPValueMap WARNING: 0 genes in experiment file does not exist in the network Oct 07, 2019 12:48:28 PM ActiveSubnetworkSearchMisc.ScoreCalculations fillNodeToPValueMap WARNING: 0 genes in experiment file does not exist in the network Oct 07, 2019 12:48:47 PM ActiveSubnetworkSearchMisc.ScoreCalculations fillNodeToPValueMap WARNING: 0 genes in experiment file does not exist in the network Oct 07, 2019 12:48:47 PM ActiveSubnetworkSearchMisc.ScoreCalculations fillNodeToPValueMap WARNING: 0 genes in experiment file does not exist in the network Oct 07, 2019 12:48:47 PM ActiveSubnetworkSearchMisc.ScoreCalculations fillNodeToPValueMap WARNING: 0 genes in experiment file does not exist in the network Oct 07, 2019 12:49:02 PM ActiveSubnetworkSearchMisc.ScoreCalculations fillNodeToPValueMap WARNING: 0 genes in experiment file does not exist in the network WT3_pathfindR_phos_output data frame with 0 columns and 0 rows

Desktop (please complete the following information):

R Session Information: Screen Shot 2019-10-07 at 1 10 12 PM

Additional context Add any other context about the problem here. While pathfindR is an R package, the active subnetwork search functionality is written in Java. If you suspect any issue regarding java please provide your Java version (by running java --version)

ozanozisik commented 4 years ago

0 genes in the experiment file does not exist in the network means that all the genes in the experiment file have been found in the network. We will correct this confusing warning in the next version.

7 Eki 2019 Pzt 19:11 tarihinde yoonsquared notifications@github.com şunu yazdı:

Describe the bug No resulting html or output from the function run_pathfindR()

To Reproduce Steps to reproduce the behavior: Read in the dataset as dataframe. run_pathfindR(), options added or not added, no results are produced.

Expected behavior get html and pathway results. Screenshots

WT3_pathfindR_phos_output <- run_pathfindR(WT3_pathfindR_sig_phos_table,human_genes=FALSE, adj_method="BH", output="./results/Pathway_WT3_phos") Oct 07, 2019 12:48:07 PM ActiveSubnetworkSearchMisc.ScoreCalculations fillNodeToPValueMap WARNING: 0 genes in experiment file does not exist in the network Oct 07, 2019 12:48:07 PM ActiveSubnetworkSearchMisc.ScoreCalculations fillNodeToPValueMap WARNING: 0 genes in experiment file does not exist in the network Oct 07, 2019 12:48:07 PM ActiveSubnetworkSearchMisc.ScoreCalculations fillNodeToPValueMap WARNING: 0 genes in experiment file does not exist in the network Oct 07, 2019 12:48:27 PM ActiveSubnetworkSearchMisc.ScoreCalculations fillNodeToPValueMap WARNING: 0 genes in experiment file does not exist in the network Oct 07, 2019 12:48:27 PM ActiveSubnetworkSearchMisc.ScoreCalculations fillNodeToPValueMap WARNING: 0 genes in experiment file does not exist in the network Oct 07, 2019 12:48:28 PM ActiveSubnetworkSearchMisc.ScoreCalculations fillNodeToPValueMap WARNING: 0 genes in experiment file does not exist in the network Oct 07, 2019 12:48:47 PM ActiveSubnetworkSearchMisc.ScoreCalculations fillNodeToPValueMap WARNING: 0 genes in experiment file does not exist in the network Oct 07, 2019 12:48:47 PM ActiveSubnetworkSearchMisc.ScoreCalculations fillNodeToPValueMap WARNING: 0 genes in experiment file does not exist in the network Oct 07, 2019 12:48:47 PM ActiveSubnetworkSearchMisc.ScoreCalculations fillNodeToPValueMap WARNING: 0 genes in experiment file does not exist in the network Oct 07, 2019 12:49:02 PM ActiveSubnetworkSearchMisc.ScoreCalculations fillNodeToPValueMap WARNING: 0 genes in experiment file does not exist in the network WT3_pathfindR_phos_output data frame with 0 columns and 0 rows


Desktop (please complete the following information):

  • OS: [macOS]
  • Version: [10.14.6 (18G95)]

R Session Information: [image: Screen Shot 2019-10-07 at 1 10 12 PM] https://user-images.githubusercontent.com/42597505/66332958-ec4aa780-e903-11e9-8754-0cefe9e39995.png

Additional context Add any other context about the problem here. While pathfindR is an R package, the active subnetwork search functionality is written in Java. If you suspect any issue regarding java please provide your Java version (by running java --version)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/egeulgen/pathfindR/issues/22?email_source=notifications&email_token=ACV5HU25VDUNAFS4HPZ632DQNNUV5A5CNFSM4I6HLBTKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HQEA76A, or mute the thread https://github.com/notifications/unsubscribe-auth/ACV5HUYXR352JQBNMNLPZHLQNNUV5ANCNFSM4I6HLBTA .

yoonsquared commented 4 years ago

That is good to hear, except I do not get any output back from the function. Is this normal with that warning?

ozanozisik commented 4 years ago

Hi, No, not getting any output is not related to that warning. You may consider loosening the sig_gene_thr and enrichment_threshold and see if you get results that way. If not we can focus more to solve the problem.

On Mon, Oct 7, 2019 at 7:28 PM yoonsquared notifications@github.com wrote:

That is good to hear, except I do not get any output back from the function. Is this normal with that warning?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/egeulgen/pathfindR/issues/22?email_source=notifications&email_token=ACV5HU4COFE5LNTE7CYXRMLQNNWSVA5CNFSM4I6HLBTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEARE6VQ#issuecomment-539119446, or mute the thread https://github.com/notifications/unsubscribe-auth/ACV5HU3KY243RP5FIB3J7ZDQNNWSVANCNFSM4I6HLBTA .

egeulgen commented 4 years ago

@yoonsquared if you set human_genes=FALSE (meaning that the input genes are not Homo sapiens genes) I would suggest also supplying custom gene sets and a custom PIN for the selected organism. That is probably why you are not getting any enrichment results.

To reproduce the issue though, would you mind attaching your input data frame as an RDS file (i.e. saveRDS(WT3_pathfindR_sig_phos_table, "input_df.RDS"))?

yoonsquared commented 4 years ago

Hi, No, not getting any output is not related to that warning. You may consider loosening the sig_gene_thr and enrichment_threshold and see if you get results that way. If not we can focus more to solve the problem. On Mon, Oct 7, 2019 at 7:28 PM yoonsquared @.***> wrote: That is good to hear, except I do not get any output back from the function. Is this normal with that warning? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#22?email_source=notifications&email_token=ACV5HU4COFE5LNTE7CYXRMLQNNWSVA5CNFSM4I6HLBTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEARE6VQ#issuecomment-539119446>, or mute the thread https://github.com/notifications/unsubscribe-auth/ACV5HU3KY243RP5FIB3J7ZDQNNWSVANCNFSM4I6HLBTA .

I will try that and update you. Thank you.

yoonsquared commented 4 years ago

@yoonsquared if you set human_genes=FALSE (meaning that the input genes are not Homo sapiens genes) I would suggest also supplying custom gene sets and a custom PIN for the selected organism. That is probably why you are not getting any enrichment results.

To reproduce the issue though, would you mind attaching your input data frame as an RDS file (i.e. saveRDS(WT3_pathfindR_sig_phos_table, "input_df.RDS"))?

Thanks egeulgen, I actually tried that without the human_genes=FALSE option also, it didn't go through. The test input went fine as my organism is a mouse. I will try doing the PIN, and the input basically is the same as reading the input file.

egeulgen commented 4 years ago

Since your organism is mouse, I'd again recommend using a mouse-specific PIN and a mouse-specific gene set source (see #5, #13). The input testing might be OK but it only checks for proper formatting and how many input genes are present in the PIN but not the gene sets. Closing the issue. Try it out and let us know if the issue persists by re-opening.

yoonsquared commented 4 years ago

Oct 10, 2019 10:54:20 AM Network.Network addInteraction WARNING: Self interaction discarded Oct 10, 2019 10:54:20 AM Network.Network addInteraction WARNING: Self interaction discarded Oct 10, 2019 10:54:20 AM Network.Network addInteraction WARNING: Self interaction discarded Oct 10, 2019 10:54:20 AM Network.Network addInteraction WARNING: Self interaction discarded Oct 10, 2019 10:54:20 AM Network.Network addInteraction WARNING: Self interaction discarded Oct 10, 2019 10:54:21 AM ActiveSubnetworkSearchMisc.ScoreCalculations fillNodeToPValueMap WARNING: 551 genes in experiment file does not exist in the network

sif_path <- "../pathfindR/mmusculusPIN.sif" mmu_kegg_genes <- readRDS("../pathfindR/mmu_kegg_genes.RDS") mmu_kegg_pathways <- readRDS("../pathfindR/mmu_kegg_pathways.RDS")

tumor_pathfindR_phos_output <- run_pathfindR(tumor_pathfindR_sig_phos_table,human_genes=FALSE, pin_name_path = sif_path, gene_sets = "Custom", custom_genes = mmu_kegg_genes, custom_pathways = mmu_kegg_pathways, p_val_threshold = 0.05, enrichment_threshold = 0.1, sig_gene_thr = 8, adj_method = "none", output_dir="../results/Pathway_tumor_phos")

custom_genes and custom_pathways were copied and pasted from your codes. inputRDS.zip

Do you have any idea on this? I have detached pathview and reattached also.

Thanks in advance. Best, Joon

yoonsquared commented 4 years ago

Since your organism is mouse, I'd again recommend using a mouse-specific PIN and a mouse-specific gene set source (see #5, #13). The input testing might be OK but it only checks for proper formatting and how many input genes are present in the PIN but not the gene sets. Closing the issue. Try it out and let us know if the issue persists by re-opening.

Would it be possible to re-open this issue? Thanks

egeulgen commented 4 years ago

You do not need to unload/reload pathview in this case because it will not be used when genes_sets="Custom".

I cannot tell what you used as the M.musculus PIN but I'd suggest using the latest BioGRID PIN. Process it save it as tab-delimited SIF format so that it has three columns: Interactor Symbol A, a column of "pp", and Interactor Symbol B.

For compatibility between all symbols, I'd also suggest that you use upper case for all (i.e. input genes, PIN and the gene sets). (For the next release we're actually implementing this internally)

Since this is a common request, I'll try to make a vignette (which again will be available in the next release)describing how to create the necessary data and run pathfindR with non-Homo sapiens organisms soon and let you know.

I'm reopening the issue -E

yoonsquared commented 4 years ago

I can upload the PIN and other files also, but I have followed the exact steps in your troubleshooting. I will be adding in the code first, if you need the files I will upload it again.

Setup for pathfindR

make Organism-specific PIN; the SIF file.

mm10_pin_raw <- read.table("../pathfindR/BIOGRID-ORGANISM-3.5.177.tab2/BIOGRID-ORGANISM-Mus_musculus-3.5.177.tab2.txt",stringsAsFactors = F,sep="\t",fill=TRUE,header=F)
pp_column <- rep("pp",30883)
mm10_pin_sif <- data.frame(mm10_pin_raw[,8],pp_column,mm10_pin_raw[,9])
write.table(mm10_pin_sif,"../pathfindR/mmusculusPIN.sif",col.names=F,row.names=F,sep="\t")

make Organism-specific Gene Sets and Pathways object.

# Create M.musculus KEGG Gene Sets ----------------------------------------

# Obtain list of M.musculus pathways
pathways_list <- KEGGREST::keggList("pathway", "mmu")

# Turn the identifiers of KEGGREST into KEGG-style pathway identifiers
pathway_codes <- sub("path:", "", names(pathways_list))

# Obtain and parse genes per each pathway
genes_by_pathway <- sapply(pathway_codes, function(pwid){
  pw <- KEGGREST::keggGet(pwid)
  pw <- pw[[1]]$GENE[c(FALSE, TRUE)] ## get gene symbols, not descriptions
  pw <- sub(";.+", "", pw) ## discard any remaining description
  pw <- pw[grepl("^[a-zA-Z0-9_-]*$", pw)] ## remove mistaken lines that cannot be gene symbols
  pw
})

## Form the custom genes object
mmu_kegg_genes <- genes_by_pathway
mmu_kegg_genes <- mmu_kegg_genes[sapply(mmu_kegg_genes, length) != 0]
mmu_kegg_genes <- sapply(mmu_kegg_genes, toupper)
saveRDS(mmu_kegg_genes, "../pathfindR/mmu_kegg_genes.RDS")

## Form the custom pathways object
mmu_kegg_pathways <- KEGGREST::keggList("pathway", "mmu")
names(mmu_kegg_pathways) <- sub("path:", "", names(mmu_kegg_pathways))
mmu_kegg_pathways <- sub(" - Mus musculus \\(mouse\\)", "", mmu_kegg_pathways)
mmu_kegg_pathways <- mmu_kegg_pathways[names(mmu_kegg_pathways) %in% names(mmu_kegg_genes)]
saveRDS(mmu_kegg_pathways, "../pathfindR/mmu_kegg_pathways.RDS")
yoonsquared commented 4 years ago

FYI, that is the code before I re-opened the case. So please let me know if there are things to change.

Thanks

egeulgen commented 4 years ago

If I understand correctly, you do not get any enrichment results with the m.musculus data.

For compatibility between all symbols, I'd also suggest that you use upper case for all (i.e. input genes, PIN and the gene sets). (For the next release we're actually implementing this internally)

As I've posted above, I'd initially suggest using upper case symbols (can be achieved via toupper() in R) in all data for compatibility between all symbols.

Another thing to consider is that the M.musculus is relatively small, yielding it less likely to find any active subnetworks. In that case, I'd recommend changing the arguments from their defaults (see the wiki)

Hope this helps, -E

yoonsquared commented 4 years ago

Hi Ege,

I tried your suggestions, toupper() has been done on the input.

tumor_pathfindR_output <- run_pathfindR(tumor_pathfindR_sig_table, human_genes=FALSE, 
                            pin_name_path = sif_path,
                            gene_sets = "Custom", 
                            custom_genes = mmu_kegg_genes, 
                            custom_pathways = mmu_kegg_pathways,
                            p_val_threshold = 0.1,
                            enrichment_threshold = 0.5,
                            sig_gene_thr = 2,
                            adj_method = "none",
                            output_dir="../results/Pathway_tumor_prot")

And have lowered the threshold to 2 genes with raw p-value 0.1, and I ended up with no output again with the same warnings.

Oct 15, 2019 10:37:07 AM ActiveSubnetworkSearchMisc.ScoreCalculations fillNodeToPValueMap
WARNING: 2,022 genes in experiment file does not exist in the network

I ran the exact gene list on the DAVID, and have the kegg_pathway and GO results. And 2,022 genes should be good enough size of a list to do a pathway analysis, (I know some have 400 in this analysis).

Any suggestions? I would love to use the plots and results from your package.

Sorry for the troubles.

Thanks.

Best,

egeulgen commented 4 years ago

Sorry for the troubles.

No problem at all.

WARNING: 2,022 genes in experiment file does not exist in the network

From the warning it seems that a very high number of input genes could not be mapped onto the PIN. I'm thinking this is because you did not use toupper() while processing the PIN. Again, turn symbols in all data (i.e. the input, PIN and gene sets) to upper case.

I tried to perform the custom analysis with your input and got no enrichment as well See the gist here

This is because BioGRID mmu is very small (only ~20k interactions) so pathfindR cannot find any enrichment because the PIN (also used as background genes for enrichment analysis) contains only 7333 genes.

I suggest using an alternative PIN source for the M.musculus PIN. Could be GeneMANIA, IntAct or other.

I ran the exact gene list on the DAVID, and have the kegg_pathway and GO results. That's not surprising as DAVID internally recognizes M.musculus symbols while pathfindR does not and (for now) requires a bit more pre-processing. We'll work on a way for making non-Homo sapiens analysis easier.

Hope this helps, Best, -E

yoonsquared commented 4 years ago

Hi Ege,

toupper() on everything and still have the same troubles, I have messed with the thresholds and will be trying out other PINs.

thanks.

Best, J

egeulgen commented 4 years ago

hey @yoonsquared, The latest development version contains the Mus musculus STRING PIN and a vignette walking through non-human (the example is mouse) pathfindR analysis.

return_pin_path("mmu_STRING")

I think these should help with your case. Let us know if we can be any furher help, Best, -E

yoonsquared commented 4 years ago

Thanks, I will try to use it on the next one. Have a good one Ege!

Best, -Jay