felixhorns / FlyPN

Workflows and analysis of scRNA-seq of Olfactory Projection Neurons for Li, Horns et al. (2017)
MIT License
20 stars 11 forks source link

NaN or Inf error #1

Open amanda-hi opened 6 years ago

amanda-hi commented 6 years ago

Hello,

My colleagues and I are trying to adapt your ICIM script for our own differential expression project with olfactory epithelium, however, we've been running into quite a few errors and would like some clarification. I was able to run your "ICIM_example" script up until the "Display cells using tSNE" step, where I received the following error raised by the sklearn module:

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

I am feeding in a counts matrix and metadata file to the "Load Data" step, neither of which contain infinite values. I otherwise have not changed any parameters from the ICIM_example and am not sure why we're getting this error. I've cloned the repo and adjusted the file paths accordingly. Do you have any suggestions for how to get around this? The actual marker gene identification step seems to have been successful, identifying 319 genes. We are incredibly excited to use this script but have been running into problems getting it going!

Thanks so much,

felixhorns commented 6 years ago

Hi Amanda,

Thank you for your interest in our approach!

I have experienced this type of error. I suspect that the distance matrix which is calculated in sct.TSNE.calc_TSNE() has NaN, infinity, or large values. By default, the distance matrix is calculated using pairwise Pearson correlation. You might have pair(s) of genes that yield badly behaved correlation values. For instance, if the standard deviation of one of the genes is zero, then the correlation will be NaN.

I would suggest running the distance matrix code in calc_TSNE() directly on your count matrix of ICIM-selected genes. Then check whether there are any badly behaved values. If there are, I'd consider either removing those genes or masking the values in an appropriate way (e.g. filling in zeros or ones) and passing the distance matrix directly to the TSNE method (you might have to modify sct.py with an updated TSNE method to allow this).

Let me know whether that helps. Good luck!

Felix

amanda-hi commented 6 years ago

Hey Felix! Thanks for the quick response!

I went in and used this code to generate my own distance matrix:

dist = 1-X.corr()
dist = np.clip(dist, 0.0, max(np.max(dist))) 

I fed this dist matrix straight into the calc_TSNE() function in the sct script, which worked (woo!), but then got an error at the plot() step that said: raise KeyError("None of [%s] are in the [%s]" %, with a list of cell barcodes in the first [%] space. Is there another parameter within the plot() function that needs to be changed that I'm missing? It seems to me that the script is trying to color each of the cells in the list matrix, but is indexing the matrix incorrectly. I could be totally wrong about that, though.

Thanks again for your help!

felixhorns commented 6 years ago

You may need to initialize the TSNE object with a df_libs that is indexed by the same names as df or X.