davismcc / archive-scater

An archived version of the scater repository, see https://github.com/davismcc/scater for the active version.
64 stars 18 forks source link

Error: featureNames differ between assayData and featureData #89

Closed zhiyhu closed 7 years ago

zhiyhu commented 7 years ago

When I tried to plot PCA figures by filter, it gives the error massege. The following is the simplified codes. Some of them were borrowed from the materials in the workshop.

First I read in the files and create the SCEset.

> counts <- read.table(file = "../counts_sc_161211.txt",header = T,as.is = T)
> row.names(counts) <-counts$Geneid
> counts <- counts[,-1:-6]
> colnames(counts)<- paste(rep("sc161206_C",14),1:14,sep="")
> counts <- as.matrix(counts)
> 
> phenoData <- new("AnnotatedDataFrame", data = data.frame(Cell=colnames(counts)))
> rownames(phenoData) <- phenoData$Cell
> fdata <- new("AnnotatedDataFrame", data = data.frame(hgnc_symbol=rownames(counts)))
> rownames(fdata) <- fdata$hgnc_symbol
> sceset <- newSCESet(countData = counts, phenoData = phenoData,
+                     featureData = fdata)
> sceset <- getBMFeatureAnnos(
+     sceset, filters = "hgnc_symbol",
+     attributes = c("ensembl_gene_id", "hgnc_symbol", "chromosome_name", 
+                    "start_position", "end_position", "strand", "gene_biotype"),
+     feature_symbol = "hgnc_symbol",
+     feature_id = "ensembl_gene_id", biomart = "ENSEMBL_MART_ENSEMBL",
+     dataset = "hsapiens_gene_ensembl", host = "www.ensembl.org")
> sceset <- calculateQCMetrics(sceset)
> sceset$description <- rep("cell",14)
> sceset$description
 [1] "cell" "cell" "cell" "cell" "cell" "cell" "cell" "cell" "cell" "cell" "cell" "cell"
[13] "cell" "cell"

If I simply plot all, it works well. > plotPCA(sceset, colour_by = "description")

But if I try to use filter, it gives the following massage.

> plotPCA(filter(sceset, description == "cell"))
Error in validObject(x) : 
  invalid class “SCESet” object: featureNames differ between assayData and featureData

It bothers me when I tried to plot some other plots also. I am not very sure where the assayData comes from.

davismcc commented 7 years ago

Hi @ZYBunnyHu

By way of explanation, the SCESet object contains all (transformations of) expression data (e.g. count matrix, exprs matrix) in an "assayData" slot inherited from the ExpressionSet class. This is why you see a reference to "assayData" in the error message.

This is obviously not intended behaviour. Could you please provide the output of sessionInfo() after getting this error, and (if you can) share the dataset so that I can run this code myself and try to replicate the error?

There is a mismatch introduced somewhere between featureNames(sceset) and rownames(featureData(sceset)), but this should be discovered internally, so there is a bug. Could you try running identical(featureNames(sceset), rownames(featureData(sceset))) after the calls to newSCESet, getBMFeatureAnnos and calculateQCMetrics to see where these become non-identical?

Best Davis

davismcc commented 7 years ago

Hi @ZYBunnyHu - has this issue persisted? If not, I will close the issue presently.

zhiyhu commented 7 years ago

No. Please go ahead.

Cheers!

On 10 Jan 2017, at 12:11, Davis McCarthy notifications@github.com wrote:

Hi @ZYBunnyHu https://github.com/ZYBunnyHu - has this issue persisted? If not, I will close the issue presently.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/davismcc/scater/issues/89#issuecomment-271559998, or mute the thread https://github.com/notifications/unsubscribe-auth/AL_1m4NvIKlX72YF3FqxNQifeKfZHpjJks5rQ3VVgaJpZM4LN7Y6.

davismcc commented 7 years ago

Thanks. We appreciate the feedback - happy scater'ing.