akdess / CaSpER

78 stars 28 forks source link

single-cell dataset (not 10x) running issue with genome version hg38 #75

Closed lifan18 closed 2 years ago

lifan18 commented 2 years ago

Dear Dr. Akdess

CaSpER is a very wonderful tool to discovery CNVs in single cell dataset. I am following your example (https://rpubs.com/akdes/673120) to run it now.

My dataset is a single cell sequencing of human brain (not 10X) and I use hg38 as a reference.

However, there are some errors I cannot get through.

  1. annotation part annotation <- generateAnnotation(id_type="hgnc_symbol", genes=genes, centromere=centromere, ishg19 = T) I used ishg19 = F and I don't know it should be F or not as I use hg38. I generated the centromere information as your example curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/cytoBand.txt.gz" | gunzip -c | grep acen > centromere.txt, but it seems cannot input as centromere="centromere.txt".

  2. readBAFExtractOutput

    > loh <- readBAFExtractOutput ( path="w1.baf", sequencing.type="single-cell")
    Error in file(file, "rt") : cannot open the connection
    In addition: Warning message:
    In file(file, "rt") : cannot open file 'w1.baf/NA': Not a directory

    It seems the baf file path should be a directory, however, although I make a directory for the BAFExtract output file (only one file, not files), it still cannot read the file.

Another question is how to generate loh.name.mapping file.

Sorry for my long questions. Hope you can help me.

Thank you very much!!!

lifan18 commented 2 years ago

Hi,

Here is update. My question 2 is solved by #62 with the solution: fill it with the path of the folder where you have your file instead of the name of the file. At the same time your file need to be a ".snp" file not an "_baf", change both.

I still don't understand which file is loh.name.mapping file and how to input centromere file.

Thank you very much!

lifan18 commented 2 years ago

Dear Dr. Akdess,

One more question, how to generate control.sample.ids file? There is no comment on this file in CaSpER documentation.

object <- CreateCasperObject(raw.data=data,loh.name.mapping=loh.name.mapping, sequencing.type="bulk", 
  cnv.scale=3, loh.scale=3, matrix.type="normalized", expr.cutoff=4.5,
  annotation=annotation, method="iterative", loh=loh, filter="median",  
  control.sample.ids=control.sample.ids, cytoband=cytoband)

Error in new(Class = "casper", raw.data = raw.data, loh = loh, annotation = annotation,  :
  argument "control.sample.ids" is missing, with no default

Thank you!

lifan18 commented 2 years ago

All issues are solved now. Annotation file can be generated in local queue. Control.sample.id can be assigned by in a data.frame list and can be customized.

lifan18 commented 2 years ago

BTW, I would like to share a reminder learning from my bug.

When I ran samples with the control and other clusters I assigned, there was an error like Performing HMM segmentation... Processing cnv.scale:1 loh.scale:1... Error in value[[jvseq[[jjj]]]] : subscript out of bounds Calls: runCaSpER ... calculateLOHShiftsForEachSegment -> [<- -> [<-.data.frame

I checked all input files and found it caused by one missing parameter, names(loh).

Although, I have correct loh.name.mapping file. The extra code line, names(loh) <- gsub(".snp", "", names(loh)), is still needed for the next processes. Otherwise your names of loh in casper will have suffix .snp and it is unacceptable by casper.

Hope this will help the next ;)

Best,

Fan

44REAM commented 1 year ago

@lifan18 Hi I have a same question on annotation part. Should I use F for ishg19 in generateAnnotation or not? Thank you