aet21 / EpiSCORE

Epigenetic cell-type deconvolution from Single-Cell Omic Reference profiles
25 stars 9 forks source link

Error in ImputeDNAmRef #3

Open Duuudude opened 3 years ago

Duuudude commented 3 years ago

Dear developers, I encountered an error while running: refMscm2.m<- ImputeDNAmRef(expref.o$ref$med,db="SCM2",geneID="SYMBOL"); Error in eidNA.v[n] <- xx[[map.idx[n]]][1] : replacement has length zero I have a different tissue type, could this error mean that there is no hit for my marker genes in the database?

I constructed expref.o using: expref.o <- ConstExpRef( data, celltype.idx, celltype.v ); and succesfully obtained the reference matrix: [1] "Now construct reference" [1] 1727 5 Type1 Typel2 Type3 Type4 Type5 CYP19A1 0 0 0 0 3.169925 SERPINE1 0 0 0 0 2.807355 EPS8L1 0 0 0 0 2.321928

aet21 commented 3 years ago

Hi, First of all, tissue-type is really important, as EpiSCORE is only designed for solid tissue-types. For blood, PBMC, cord-blood, EpiSCORE is not appropriate, as for these tissues we have ample FACS-sorted data to build DNAm reference matrices. However, the reason why EpiSCORE fails in your case must be related to something else, because a number of marker genes in the database must have been found. One reason could be if your data is in a different species to the one where you built the expression reference, as we did not include support for homology mapping. Or maybe you are missing the org.Hs.eg.db library (unlikely though).

Duuudude commented 3 years ago

Thank you for the reply! I am trying to build the DNAm reference, so I don't think this error is related with different tissue types yet (actually I have single cell and DNAm data from the same tissue same species).

I checked into the ImputeDNAmRef method, the error raised from for following for loop:

for (n in 1:length(na.idx)) {
  eidNA.v[n] <-  xx[[map.idx[n]]][1]
}

where some xx[[map.idx[n]]][1] return NULL thus causing the trouble. Can I directly remove the na after matching symbol with EntrezID?

Duuudude commented 3 years ago

Dear developer, Sorry to bother you again. You mentioned the Roadmap epigenomic database contains 111 tissue and cell types. I am wondering if you include the whole database or subset of tissue/cell types in the EpiSCORE.

aet21 commented 3 years ago

We only used samples for which there was both DNA methylation and RNA-Seq expression. Any sample related to cancer was dropped. At the time we processed the data, there were 45 samples with both types of data in RMAP and 34 samples in SCM2. You can check what the samples are since the database matrix columns are annotated to their tissue/cell-type.

Duuudude commented 3 years ago

Thank you for the reply. I checked that my tissue of interest was not included in neither RMAP nor SCM2. Can I still use EpiSCORE to impute the DNAm reference for our samples? Do we assume that model trained from those samples in RMAP and SCM2 can be applied to other tissues (or cell types) universally?

aet21 commented 3 years ago

What is your tissue?

Duuudude commented 3 years ago

Placenta

aet21 commented 3 years ago

EpiSCORE was designed for adult tissues (e.g. adult lung, breast, skin, brain,...). I doubt that it can be successfully applied to placenta, as many of the cell-types present in placenta were not represented in the RMAP and SCM2 databases. As always, you can try to build the DNAm reference, but if I were reviewer of your paper i would demand validation of the DNAm reference.

bindej99 commented 3 years ago

Thank you for the reply! I am trying to build the DNAm reference, so I don't think this error is related with different tissue types yet (actually I have single cell and DNAm data from the same tissue same species).

I checked into the ImputeDNAmRef method, the error raised from for following for loop:

for (n in 1:length(na.idx)) {
  eidNA.v[n] <-  xx[[map.idx[n]]][1]
}

where some xx[[map.idx[n]]][1] return NULL thus causing the trouble. Can I directly remove the na after matching symbol with EntrezID?

Hey, i have the same struggles! Instead of Duuudude im using the mammary gland as described in the EpiSCORE publication as tissue. In my case, the map.idx object contains NAs.

Thank you for your help!

aeteschendorff commented 3 years ago

There is a bug in that part of the code where it converts the gene annotation, which will be corrected in due course. A quick-fix is for you to simply reannotate your expression reference matrix to NCBI/Entrez gene IDs, then the function will not need to do the conversion, and should run smoothly.

hnlmarcus commented 2 years ago

I was having the same issue and solved it as you describe. But now I am getting another error:

refMrmap.m <- ImputeDNAmRef(expref.o$ref$med,db="RMAP",geneID="ENTREZID"); Error in p.m[g, ] : subscript out of bound

p.m was my normalized count matrix. This is a third type of error I am getting with the ImputeDNAmRef function, and I have tried various different inputs, checked for ranges and negative values, etc. From everything that I can see from object description of expref.o$ref$med, I tried to make it as similar to yours use in the vignette, but cannot get the function to work. I am also using 4 categories only.

traceback() 2: which(p.m[g, ] < 0.2) 1: ImputeDNAmRef(expref.o$ref$med, db = "SCM2", geneID = "ENTREZID")

Do you know what 0.2 is referring to?

aet21 commented 2 years ago

Well, the "p.m" within the ImputeDNAmRef function refers to the expression matrix from the database (RMAP/SCM2), so nothing to do with your count matrix. I suspect the problem may be related to your expref.o$ref$med....... it could be that for one or more cell-types, there is no marker gene that are not expressed in that cell-type, which would then make "notE.idx" NULL and "p.m" undefined, and you'd get the error you are reporting. Alternatively, you still have a problem with your gene annotation. For instance, it could be that your EntrezID are e.g. " 2406" which would perhaps not match to the numeric 2406 identifier in the database reference matrices we provide. In case the problem is the former one, then it seems you have a potential error in the construction of the expression reference matrix. Hope this helps.