Closed peeratant closed 5 years ago
Not sure I fully understand your question. Are you saying the normalize_data()
function is too slow for large datasets?
Or are you saying that after you run normalize_data()
and you want to merge the resulting matrix back into the original matrix, you do this via a loop that is very slow?
Not sure I fully understand your question. Are you saying the
normalize_data()
function is too slow for large datasets?Or are you saying that after you run
normalize_data()
and you want to merge the resulting matrix back into the original matrix, you do this via a loop that is very slow?
What I mean is that, when normalizing with this function. every gene that has 0 count in every cell will be removed in if (any(totalUMIPerCell == 0))
condition
For example, I use data set that have 20,000 genes but only 3,000 genes express with at least one cell. after using provided normalized function, 17,000 genes will be removed.
I hope to know the way that efficiently merge imputation result of 3,000 genes back to original data. (20,000 genes with zero count)
Got it.
Just run imputation on the subset of rows that are non-zero and then set the appropriate rows of the larger matrix to the result of imputation on the subset.
For example, something like this:
# Make a matrix that has a row of all zeros
A <- matrix(1,nrow=10,ncol = 5)
A[2,] <- 0
print(A)
totalUMIPerCell <- rowSums(A)
toKeep <- which(totalUMIPerCell >0)
A_subset <- A[toKeep, ]
# Run imputation on A_subset. Here I'll just add the number 1, just for demo
A_subset <- A_subset + 1
A[toKeep, ] <- A_subset
print(A)
That's helpful, Thanks.
According to the code below, normalized function seems to remove row that has total count equal to zero.
If I want to merge computation result back to original data, I have to follow these steps
Although this approach may yield satisfied result but the performance will greatly affect which really matter for big dataset.
Do you have any suggestions for improve the performance in this case?
` normalize_data <- function (A) {
Simple convenience function to library and log normalize a matrix
}
`