Open asmagen opened 7 years ago
Also I get this error: Error in cor(fpkm_temp, method = "pearson") : Missing values present in input variable 'x'. Consider using use = 'pairwise.complete.obs'. I didn't have any NA values in my dataset. Any idea what might cause that? Thank you.
Dear Asmagen,
Have you followed the gene name requirement as stated in the manual?
######################################################################### Input data: A data frame of expression values (FPKM, TPM, UMI counts ...), with rows representing genes and columns representing cells. Note the current version of RCA only accepts gene names in the following format: "GenomeLocation_HGNCGeneName_EnsembleID", from which the "HGNCGeneName" is extracted for RCA analysis. For input data with only HGNC names, the users need to attach two strings to the HGNC names to make them into the "XXXX_HGNCGeneNames_YYYY" format" #########################################################################
So for gene symbol ‘BRCA1’ I need to use ‘XXXX_BRCA1_YYYY’?
On Apr 18, 2017, at 11:37 PM, GIS-SP-Group notifications@github.com wrote:
XXXX_HGNCGeneNames_YYYY
Correct. Sorry for the inconvenience and we will improve this in the next version.
Huipeng
The same issue still occurs. It doesn't have to do anything with the gene names. What can be done about it?
Asmagen,
Wonder if you followed the procedure in Vignettes.
Please paste your script here.
Huipeng
library(RCA)
rownames(dataset$counts) = sapply(rownames(dataset$counts),function(v) paste('XXXX',v,'YYYY',sep='_')) data_obj = dataConstruct(dataset$counts);
data_obj = geneFilt(obj_in = data_obj);
data_obj = cellNormalize(data_obj,method='scQ');
normalized = dataTransform(data_obj,"log10");
data_obj = featureConstruct(normalized,method = "SelfProjection")
data_obj = cellClust(data_obj,method="hclust",deepSplit_wgcna=environment$cluster.param2,min_group_Size_wgcna=2)
cluster.association = data_obj$group_labels_color$groupLabel
Hi, Asmagen,
Could you provide the table of "normalized$fpkm_transformed" via email? It seems that the "featureConstruct" failed to select any features.
Huipeng
It’s unpublished data so I can’t. It doesn’t make much sense that the issue is specific to my dataset also.
On Apr 23, 2017, at 7:25 PM, GIS-SP-Group notifications@github.com wrote:
Hi, Asmagen,
Could you provide the table of "normalized$fpkm_transformed" via email? It seems that the "featureConstruct" failed to select any features.
Huipeng
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/GIS-SP-Group/RCA/issues/2#issuecomment-296510307, or mute the thread https://github.com/notifications/unsubscribe-auth/AKxq8FgS0IzPG-WkjeEP9lWuQdLXoo3Hks5rzAgWgaJpZM4NAlE6.
Ok, since your script works well on our data set, this issue is likely specific to your data set.
Let me know if you are ok with sharing the following information, which might help us to figure out what's going on.
dim(normalized$fpkm_raw) dim(normalized$fpkm) sum(normalized$geneFilter) dim(normalized$fpkm_transformed) max(normalized$fpkm_transformed) min(normalized$fpkm_transformed)
Sure.
dim(normalized$fpkm_raw) [1] 14919 1441 dim(normalized$fpkm) [1] 13389 1441 sum(normalized$geneFilter) [1] 13389 dim(normalized$fpkm_transformed) [1] 7724 1441 max(normalized$fpkm_transformed) [1] 2.045323 min(normalized$fpkm_transformed) [1] 0
On Apr 23, 2017, at 8:00 PM, GIS-SP-Group notifications@github.com wrote:
Ok, since your script works well on our data set, this issue is likely specific to your data set.
Let me know if you are ok with sharing the following information, which might help us to figure out what's going on.
dim(normalized$fpkm_raw) dim(normalized$fpkm) sum(normalized$geneFilter) dim(normalized$fpkm_transformed) max(normalized$fpkm_transformed) min(normalized$fpkm_transformed)
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/GIS-SP-Group/RCA/issues/2#issuecomment-296513823, or mute the thread https://github.com/notifications/unsubscribe-auth/AKxq8DpI_GuTIVe5__9I7x_99QpkAV5Eks5rzBAygaJpZM4NAlE6.
Any news?
Dear Asmegen,
My guess is that the size of your matrix is not compatible with some hard-coded parameters in the package. We need to explore more for a solid answer though.
You could try to run the package with a randomly chosen subset (~500 cells) and see if the problem still exists.
H
Hello, The featureConstruct works when I select random 500 cells, which is a very small number in comparison to the recent ScRNA-Seq technologies. But the actual clustering fails: Error in cor(fpkm_temp, method = "pearson") : Missing values present in input variable 'x'. Consider using use = 'pairwise.complete.obs'.
The code has hard coded parameters that relate to the matrix size? How can it be resolved asap? Thanks, A
Hi, Asmagen,
We have tested our package on many data sets available on our side and it seems to work fine. We are indeed optimizing the package and will release the next version in the next couple of months.
But to have a quick solution for you, we really need something to mimic the difficulty you encountered. We don't need to see your full raw data set. But if you could generate a fake set that could be representative of the original one, that would be great.
Let me know how you think.
H
Attached a subset of the 3k pbmcs published as an example of the Seurat package. The RCA method didn't work for this public dataset as well. Please let me know what's the status when you have news. example.data.RData.zip
Hello, What's the status? Thanks, A
Hi, two guys. Dose the problem have been solved ? I also get the same error,and my data produced from 10X genomics single cell cellranger pipeline. The data frame of expression values is UMI counts, with rows representing genes and columns representing cells. And gene names is changed to the following format: "GenomeLocation_HGNCGeneName_EnsembleID" .The error info : data_obj = featureConstruct(normalized,method = "SelfProjection") Error in cor(fpkm_for_clust0, method = "pearson") : 'x' has a zero dimension
Thank you very much! Frank
Dear all,
We have been testing the performance of RCA on multiple datasets on our side. For data sets from dropseq protocol, since they are usually under shallow sequencing, some of the cells might have very few expressed genes (FPKM or UMI count >0). This will cause some problem of RCA.
So when running RCA for large data sets, please do a preliminary QC to filter out bad quality cells (with sum(FPKM>0) <=1000 or sum(FPKM>0)<=500, the same of UMI count data).
Please let me know if more stringent QC would solve the problem.
best Huipeng
Hello, I get the following error after following the manual for a single-cell dataset I'm working with.
data_obj = featureConstruct(normalized,method = "SelfProjection") Error in cor(fpkm_for_clust0, method = "pearson") : 'x' has a zero dimension
Why does it happen and how can I solve this? Thanks, A