ajitjohnson / imsig

Immune Cell Gene Signatures for Profiling the Microenvironment of Solid Tumours
26 stars 7 forks source link

issue with basicstats function #15

Closed rgalvin516 closed 4 years ago

rgalvin516 commented 4 years ago

Hello, firstly - thanks a lot for your work on this pipeline, I generated very interesting results from the imsig function. But I am running into an issue with the basicstats and plot_network functions requiring too much RAM:

Basicstats <- gene_stat(exp, r = 0.6)
---> Checking zero-variance data... ---> Total number of variables: 56505 ---> WARNING: 16011 variables found with zero variance Error: cannot allocate vector of size 12.2 Gb

This was with a data frame with just 5 columns. Row names are gene symbols and column names are sample IDs

Is this typical for this function or could I be doing something wrong (I suspect the latter since all the other functions in your package work just fine on my machine)

Thanks again

ajitjohnson commented 4 years ago

Hi, @rgalvin516 Thank you.

Yeah, that is weird with just 5 samples. Maybe that is the problem that you have very few samples? What data are you using?

Can you try increasing ther value? or removing rows that are all zeros?

Let me know how it goes.

rgalvin516 commented 4 years ago

I am using gene expression data (log2(expected_counts + 1)) pulled from a database hosted on Xena and transformed back to natural scale in R

Using a different subset with hundreds of columns, I have the same problem, even after writing a line of code to remove rows with 0 variance

exp <- exp[-which(apply(exp[ , -1 ] , 1 , var) == 0 ) , ] dim(exp)

51332 304

nonzero_row <- C1_GeneExp_transformed[rowSums(C1_GeneExp_transformed) > 0, ] # filtered row read count above 0 dim(nonzero_row)

51332 304

Basicstats <- gene_stat(exp, r = 0.99) Error: cannot allocate vector of size 19.6 Gb

Bummer!

ajitjohnson commented 4 years ago

hmm, never had this issue previously. Can you share a small dataset that I can use to reproduce the error?

rgalvin516 commented 4 years ago

exp.txt

Absolutely - I attached the file with just 5 columns and with zero variance genes removed. When I run this I get the error: "Error: cannot allocate vector of size 11.2 Gb." I have R version 4.0.2

traceback() 3: cor(xt) 2: fastCor(t(exp)) 1: gene_stat(exp, r = 0.7)

ajitjohnson commented 4 years ago

Thank you @rgalvin516 It was indeed a real memory error while computing the correlation matrix.

Can you download the latest version and try again? It should work now.

if( !require(devtools) ) install.packages("devtools")
devtools::install_github( "ajitjohnson/imsig", INSTALL_opts = "--no-multiarch")

Also can you please confirm if your previous result with the imsig function is identical with this new version? Just to make sure my fix did not break anything. Thank you very much.

rgalvin516 commented 4 years ago

Hello,

Everything seems to work fine now, and the results from the imsig function were identical. Thanks for the fix - now I can proceed with my analysis feeling less blind about what's happening under the hood

Cheers

ajitjohnson commented 4 years ago

thanks a lot @rgalvin516