bzhanglab / WebGestaltR

R package for WebGestalt
https://bzhanglab.github.io/WebGestaltR/
34 stars 14 forks source link

weightedSetCover input from WebGestaltR function #10

Closed kmshort closed 3 years ago

kmshort commented 3 years ago

Hi, I've read the documentation on weightedSetCover, but I'm unsure of the arguments.

Weighted Set Cover
   Description
     Size constrained weighted set cover problem to find top N sets while maximizing the coverage of all elements.

 Usage
   weightedSetCover(idsInSet, costs, topN, nThreads = 4)

 Arguments
   idsInSet 
     A list of set names and their member IDs.

   costs    
     A vector of the same length to add weights for penalty, i.e. 1/-logP.

   topN 
     The number of sets (or less when it completes early) to return.

   nThreads 
     The number of processes to use. In Windows, it fallbacks to 1.

I have an output from the WebGestaltR function, so I think I can use output_from_WebGestaltR_function$geneset For the "idsInSet" argument. Is that right?

And importantly, what about the "costs" argument, where should they come from? I tried output_from_WebGestaltR_function$expect but that didn't work either. The result from weightedSetCover was:

Begin weighted set cover...
No more candidates, ending weighted set cover
$topSets
NULL

$coverage
[1] 0

If I use isOutput = TRUE, weightedSetCover seems to be calculated in the output HTML on the fly when you click on the "Weighted Set Cover" button, but not the equivalent R code to do the same thing. I just want to do the same thing in R and have it as an R object.

many thanks, Kieran

kmshort commented 3 years ago

Ok, I've worked it out for myself by trawling through the code for WebGestaltROra.r. I'm posting this here to help others that might have the same question.

setCoverNum = 10
nThreads = 4

idsInSet <- sapply(output_from_WebGestaltR_function$overlapId, strsplit, split=";")
names(idsInSet) <- output_from_WebGestaltR_function$geneSet
minusLogP <- -log(output_from_WebGestaltR_function$pValue)
minusLogP[minusLogP == Inf] <- -log(.Machine$double.eps)
wscRes <- weightedSetCover(idsInSet, 1 / minusLogP, setCoverNum, nThreads)

regards, K

kmshort commented 3 years ago

.. and one more thing. The weightedSetCover output doesn't provide the full table like it does in the Web version. This is what you can do to get the full table.

weightedGO_full <- output_from_WebGestaltR_function[c(match(wscRes$topSets, output_from_WebGestaltR_function$geneSet)),]