Closed phisanti closed 1 year ago
The error seems to come from the function compute_testGenesets
in compute2-geneset.R
. Apparently, there is a hard limit to gene set size. If a gene set has lower than 15 genes, then it won't select any column for further analysis. Here is the code:
## filter gene sets on size
cat("Filtering gene sets on size...\n")
gmt.size <- Matrix::colSums(G != 0)
size.ok <- (gmt.size >= 15 & gmt.size <= 400)
G <- G[, which(size.ok)]
Here we have two options:
Any suggestion @ivokwee @ncullen93?
Yeah a try-catch method would be best. I'm not sure if this error occurs in the omicsplayground platform or if this issue is caught before this code gets run. This is a good example of why all data catching code should be in playbase and not in omicsplayground. Looping in @ivokwee on this.
I would debug the error. Sometimes people have just 500-1000 genes. So it is a good edge case.
Ok, taking on account that last comment, I came up with the following solution:
# If dataset is too small that size.ok == 0, then select top 100
if (sum(size.ok) == 0) {
top_100gs <- head(sort(gmt.size,decreasing = TRUE), 100)
size.ok <- names(gmt.size) %in% names(top_100gs)
}
Injecting this code in compute_testGenesets after calculating size.ok
should solve the issue. This way, there will always be at least 100 genesets. Let me know what you think, @ncullen93 @ivokwee.
Yes that's a good solution. But please test.
I can confirm the edit works in the Playbase tests as well as in the omicsplayground
Using a reduced version of the example dataset, the base pgx pipeline fails. See code below:
The error is:
This, therefore, means that x does not exist.