colomemaria / epiAneufinder

R package to detect breakpoints and assign somies to scATAC-seq data
GNU General Public License v3.0
24 stars 4 forks source link

Only Identifies Global CNVs for scATAC data #25

Open markphillippebworth opened 1 week ago

markphillippebworth commented 1 week ago

Hello! I've been trying to use your software, but I only ever see global CNV changes - no hint of single cell changes.

I've tried changing the windows size between 1e5, 5e5, and 1e6, the minFrags between 1000 to 1500, the minsizeCNV between 0 and 4, and yet I still get essentially identical resutls regardless. And the results don't really make sense to me. The data also comes from 10x, and I've been working with it for a while, so there's no significant issues with it.

Karyogram (10)

Code:

epiAneufinder(input= pathToArrow, #Enter path to your fragments.tsv file or the folder containing bam files outdir="epiAneufinder_results", #Path to the directory where results should be written blacklist="hg38-blacklist.v2.bed.gz",

Path to bed file that contains the blacklisted regions of your genome

          windowSize=1e6, 
          genome="BSgenome.Hsapiens.UCSC.hg38", #Substitute with relevant BSgenome
          exclude=c('chrX','chrY','chrM'), 
          reuse.existing=FALSE,
          title_karyo="Karyogram of sample data", 
          ncores=4,
          minFrags=1500,
          minsizeCNV=0,
          k=4,
          plotKaryo=TRUE)

[1] "Removing old file from the output folder" Subtracting Blacklist... Adding Nucleotide Information... 1 of 22 2 of 22 3 of 22 4 of 22 5 of 22 6 of 22 7 of 22 8 of 22 9 of 22 10 of 22 11 of 22 12 of 22 13 of 22 14 of 22 15 of 22 16 of 22 17 of 22 18 of 22 19 of 22 20 of 22 21 of 22 22 of 22 [1] "Finished making windows successfully" [1] "Obtaining the fragments tsv file" |--------------------------------------------------| |==================================================| GRanges object with 6 ranges and 2 metadata columns: seqnames ranges strand | barcode pcr

| [1] chr1 10066-10126 * | B038 8 [2] chr1 10067-10137 * | B038 2 [3] chr1 10067-10138 * | B038 13 [4] chr1 10068-10132 * | B038 1 [5] chr1 10071-10124 * | B038 1 [6] chr1 10071-10125 * | B038 5 ------- seqinfo: 24 sequences from an unspecified genome; no seqlengths NULL Getting Counts... Counting reads from fragments/bed file .. [1] "Count matrix with 1 cells and 4900 windows has been generated and will be saved as count_summary.rds" Correcting for GC bias... [1] "Filtering empty windows, 4900 windows remain." [1] "Successfully identified breakpoints" A .tsv file with the results has been written to disk. It contains the copy number states for each cell per bin. 0 denotes 'Loss', 1 denotes 'Normal', 2 denotes 'Gain'. [1] "Successfully plotted karyogram" Warning message: In dir.create(outdir, recursive = TRUE) : 'epiAneufinder_results/epiAneufinder_results' already exists
thek71 commented 1 week ago

Hi,

the karyogram you have is of only one cell. The software does no produce global CNVs, it produces CNVs per cell. Also, in the messages that the software prints you can see that the count matrix that is calculated includes only one cell, hence the plot. [1] "Count matrix with 1 cells and 4900 windows has been generated and will be saved as count_summary.rds"

Are you using a fragment file as input or bam file? If it is bam file, then it should be one per cell. If you are using a fragments file, please there might be a formatting issue. Another possibility is that the minimum fragments variable is still too high for the dataset you have, although if you have set that to 1000 I find it unlikely.

Best, Katia

markphillippebworth commented 1 week ago

Thank you for the quick response! You're right - that is one cell.

I was using a fragments.tsv.gz file, and I don't know why, but epiAnuefinder wasn't recognizing the cell ID column in it. When I manually subsetting the fragment file to match the exact column names from the tutorial, then it worked - I think it was looking for the cell barcode column to be in the 4th column, and it wasn't, so it was treating all the cells are one big cell. Is there a way to tell it which column names are expected for cell barcodes or something like that? Otherwise, I'll just need to manually edit each file.

Thank you again! It's neat to see it working.

image

thek71 commented 6 days ago

Hi,

I am glad that it worked out. The way that epiAneufinder is designed, as you mentioned, is to parse the fragment file using the column content. In the standard 10x fragments the fourth column has the barcodes and we designed the algorithm to be compatible with the different types of data format specifications. I am not sure how your dataset looks like, but if it is a non-standard fragment format then unfortunately you will have to convert it first. Is your dataset coming from a different vendor that has a different format specification?

Best, Katia