cytoscape / RCy3

New version of RCy3, redesigned and collaboratively maintained by Cytoscape developer community
MIT License
48 stars 20 forks source link

enrichmentmap mastermap list pattern returns non-matching files #117

Closed rosscm closed 3 years ago

rosscm commented 3 years ago

Hi,

I'm attempting to use enrichmentmap mastermap in my package to generate EMs on several datasets. I want to create two distinct networks based on subsets of my data, but I'm having difficulty getting the pattern parameter to work properly. Package being developed: https://github.com/rosscm/fedup

# run pathway enrichment (12 sets of genes)
data(geneMulti)
data(pathwaysGMT)
fedupRes <- runFedup(geneMulti, pathwaysGMT)

# write out results formatted for EM use
resultsFolder <- tempdir()
writeFemap(fedupRes, resultsFolder)

# write out common GMT file
gmtFile <- tempfile("pathwaysGMT", fileext = ".gmt")
writePathways(pathwaysGMT, gmtFile)

# folder content
list.files(resultsFolder)
[1] "callr-env-6e304fe20a92"     "negative"                  
[3] "pathwaysGMT6e301242037.gmt" "positive" 

list.files(file.path(resultsFolder, "negative"))
[1] "femap_ACACA_negative.txt"    "femap_C12orf49_negative.txt"
[3] "femap_FASN_negative.txt"     "femap_LDLR_negative.txt"    
[5] "femap_SREBF1_negative.txt"   "femap_SREBF2_negative.txt"

list.files(file.path(resultsFolder, "positive"))
[1] "femap_ACACA_positive.txt"    "femap_C12orf49_positive.txt"
[3] "femap_FASN_positive.txt"     "femap_LDLR_positive.txt"    
[5] "femap_SREBF1_positive.txt"   "femap_SREBF2_positive.txt"

# only grab "negative" files
cm <- paste0("enrichmentmap mastermap list rootFolder=", resultsFolder, " pattern=", "negative")
res <- commandsPOST(cm)
as.character(unlist(lapply(res, "[", "name")))
[1] "pathwaysGMT6e301242037"  "femap_FASN_negative"    
[3] "femap_LDLR_negative"     "femap_C12orf49_negative"
[5] "femap_SREBF1_negative"   "femap_SREBF2_negative"  
[7] "femap_ACACA_negative"   

The first file matched is a pathway GMT file I have stored in the same resultsFolder directory that I'm using as the common GMT file, but I don't want it included in my network. No matter what pattern glob I use I can't seem to successfully prevent matching that file. Any ideas of what to do here?

AlexanderPico commented 3 years ago

Hi there. From what I can tell, this has less to do with RCy3 and more to do with EnrichmentMap. I don't know much about EnrichmentMap. You'd probably be better served posting this issue here: https://github.com/BaderLab/EnrichmentMapApp/issues.

If there is an RCy3-specific issue here, please clarify and I'll try again :)

rosscm commented 3 years ago

Ah, very well. Thanks!

mikekucera commented 3 years ago

The pattern parameter uses typical command line file glob syntax. If you want to match files that end with the string negative.txt then the pattern would be *negative.txt

rosscm commented 3 years ago

When I try *negative.txt pattern this is what I get

cm <- paste0("enrichmentmap mastermap list rootFolder=", resultsFolder, " pattern=", "*negative.txt")
res <- commandsPOST(cm)
as.character(unlist(lapply(res, "[", "name")))
[1] "pathwaysGMT7ed048e076d4"

Now it doesn't match any of the "negative.txt" files and I'm just left with the one file in the parent resultsFolder directory. If I move all files out of their subdirectories, I get a worse pattern matching problem

list.files(resultsFolder)
 [1] "callr-env-7ed0191d26d5"      "femap_ACACA_negative.txt"   
 [3] "femap_ACACA_positive.txt"    "femap_C12orf49_negative.txt"
 [5] "femap_C12orf49_positive.txt" "femap_FASN_negative.txt"    
 [7] "femap_FASN_positive.txt"     "femap_LDLR_negative.txt"    
 [9] "femap_LDLR_positive.txt"     "femap_SREBF1_negative.txt"  
[11] "femap_SREBF1_positive.txt"   "femap_SREBF2_negative.txt"  
[13] "femap_SREBF2_positive.txt"   "pathwaysGMT7ed048e076d4.gmt"

cm <- paste0("enrichmentmap mastermap list rootFolder=", resultsFolder, " pattern=", "*negative.txt")
res <- commandsPOST(cm)
as.character(unlist(lapply(res, "[", "name")))
 [1] "femap_FASN_negative"     "femap_SREBF1_positive"  
 [3] "femap_LDLR_negative"     "femap_C12orf49_positive"
 [5] "femap_C12orf49_negative" "femap_LDLR_positive"    
 [7] "femap_SREBF1_negative"   "femap_FASN_positive"    
 [9] "femap_SREBF2_negative"   "femap_ACACA_negative"   
[11] "femap_SREBF2_positive"   "femap_ACACA_positive"  

Now it just grabs every file in resultsFolder (minus the pathway GMT file this time)

mikekucera commented 3 years ago

My apologies, I gave you the wrong information. The pattern argument is used to match subfolders under the resultsFolder, its not applied to individual files. That's why it matches the 'negative' folder but the common GMT file is still included. Unfortunately the only thing you can do is remove the gmt file from the resultsFolder.

rosscm commented 3 years ago

Got it, thanks for clarifying @mikekucera.