Closed cohnr closed 6 years ago
Hi @cohnr ,
Thanks for trying out PHATE, and thanks for the bug report!
Before running PHATE on single-cell RNAseq, we normally library size normalize (you can do this with phateR::library_size_normalize
and then either square root or log transform the data.
The issue you're having, however, should be unrelated to this. Can I ask you to check if your data has duplicates? You can do this with the following R code, where data
is the matrix/data frame you input to PHATE.
sum(duplicated(data))
Thanks, Scott
Hi Scott,
Thanks so much for your help with this. When I run sum(duplicated(data)) the output is "[1] 19"
Is there a way I could find the duplicated data in my data table? Thanks!
Rachel
Hi Rachel,
I will include a patch in the next version of PHATE to check for duplicated cells. In the meantime, you can check which lines are duplicated with
which(duplicated(data))
and filter your data with
data <- data[!duplicated(data),]
Let me know if that fixes your issue.
Thanks, Scott
I have a table of raw counts of single cells that I am inputting to the phateR pipeline. The column headings are the cell names and the row headings are the gene names. When running phate(counts_table) I'm seeing an error after "Calculating SVD..."
_Error in py_callimpl(callable, dots$args, dots$keywords) : ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
There are no NaN values in the data table but I'm wondering if the counts_table should be formatted differently before running phate?
Thanks!