Open bkinnersley opened 8 months ago
Hi ben, For your first question, "how we interpret the output files of ATAC-amp, and how this can help prioritise identification of genuine eCDNA amplicons? " For bulk ATAC-seq data in ‘bulk’ mode, there is only one main result from ATACAmp, the ‘.result’ file, which contains the possible eCDNA/hsr forming regions, ordered by score from highest to lowest will be. In single-cell ATAC data in ‘sc’ mode, this file will be slightly different, and in the last line of each possible ecdna/hsr region, there will be the barcode of cells that supports these regions for subsequent analyses at the cell population level. However, the results of the current ATACAmp analysis are very susceptible to the quality of the data, so for your data, I would suggest to do QC before analysing it using high quality reads, what I understand is that there are not many cases of fragments on chrY forming ecDNA, and you can prioritise regions carrying oncogenes and regions larger than 100kb.
About some parameters you mentioned 1, -Mode 0, 1, 2 is on behalf of using different input files to run ATACAmp, ‘0’ mode accept the bam file, ‘1’ mode accept the split reads and discordant reads file, and ‘2’ mode accept the interval file, in order to get the breakpoint information from other software for analysing and saving the time of running after adjusting the parameters. 2, -isize_value is the insert size of the discordant reads, this is related to the sequencing library construction method, but 1000 is a more suitable value for most of the second-generation sequencing methods on the market. 3, --interval_size controls the step size from the breakpoint when calculating the amplified region, 1000 is an empirical parameter, you can also try a larger value to speed up the calculation, or use a smaller value to make the boundaries finer. Finally, at the moment ATACAmp still has limited resolution of single cells and cannot analyse abundance for the time being, but we will continue to build on this software with updates to detect variants in conjunction with new single-cell genome-level sequencing technologies.
For bulk ATAC-seq data in ‘bulk’ mode, there is only one main result from ATACAmp, the ‘.result’ file, which contains the possible eCDNA/hsr forming regions, ordered by score from highest to lowest will be.
I also have a few questions about the tool’s output for bulk data analysis and would appreciate some clarification.
1) I noticed that some interval sets include a main (or “max”) cycle along with several smaller cycles. Could you explain the relationship between these different cycles? Also, for subsequent analyses, would you recommend focusing on the max cycle, or are the smaller cycles equally important to consider?
2) I also observed that some interval sets contain many intervals (see below) though only a subset of these are included in the identified cycles. Could you clarify the relationship between intervals that are part of cycles and those that are not? Understanding this distinction will help me interpret the results more accurately.
Thank you very much for your assistance
GSM7634668.result_amplicon interval sets3: 107,1340,1,1336,143,110,128,1338,144,1342,2837,1333,1334,3074,129 245 180000
107 chr1 121182712 121187712 5000 SRGAP2C,FAM72B
1340 chr11 67627086 67635086 8000 TBX10,NUDT8
1 chr1 174020727 174024727 4000 RC3H1
1336 chr11 63936906 63939906 3000 NAA40
143 chr1 149837418 149844418 7000 H3C14,H2AC19,H2AC18,H3C15
110 chr1 143971238 143975238 4000 SRGAP2D,FAM72C
128 chr1 145092630 145096630 4000 FAM72D,FAM72C,SRGAP2B
1338 chr11 64776822 64779822 3000 SF1
144 chr1 149847968 149856968 9000 H3C14,H2AC19,H3C15,H2BC20P,H2AC18
1342 chr11 65414590 65452590 38000 MIR612,NEAT1
2837 chr20 53567970 53649970 82000 ZNF217,LOC105372672,LOC101927770
1333 chr11 61391251 61394251 3000 TMEM216
1334 chr11 64303709 64307709 4000 ESRRA,CATSPERZ,KCNK4-TEX40
3074 chr3 72445741 72448741 3000 RYBP
129 chr1 144959583 144962583 3000 SRGAP2B
max cycle:107,1,128,110,107
cycle 1: 1,110,107
107,1
107,110
110,1
cycle 2: 1,128,110
110,1
110,128
128,1
Hello,
Thanks for this very useful package!
I just have a few questions after runnning on single-cell ATAC-Seq libraries generated from the 10X multiome kit. I ran using the following commands: python /path/to/software/ATAC-amp/AtacAmp.py \ --bam \
--name \
--isize_value 1000 \
--interval_size 1000 \
--mapq 30 \
--mode 0 \
--type sc \
--gtf /path/to/hg38.ncbiRefSeq.gtf \
--threads 16
I've attached output files from this run in "TEST_output.zip"
I just have a few questions:
Thanks very much
Best wishes
Ben TEST_output.zip