PoisonAlien / maftools

Summarize, Analyze and Visualize MAF files from TCGA or in-house studies.
http://bioconductor.org/packages/release/bioc/html/maftools.html
MIT License
452 stars 222 forks source link

Copy number analysis using ASCAT #1057

Closed MaryGoAround closed 1 month ago

MaryGoAround commented 1 month ago

Hi

I have a tumour only sample (WGS)

For CNV I done

> library(maftools)
> library(ASCAT)
> counts = maftools::gtMarkers(t_bam = "my_sample.bam",
+                              build = "hg38")
trying URL 'https://github.com/CompEpigen/ezASCAT/blob/main/inst/extdata/GRCh38_SNP6.tsv.gz?raw=true'
Content type 'application/octet-stream' length 3532118 bytes (3.4 MB)
==================================================
downloaded 3.4 MB

Fetching readcounts from BAM files..
Processing RUMA_LM1.recal.bam :
 current chromosome: 1 
 current chromosome: 10 
 current chromosome: 11 
 current chromosome: 12 
 current chromosome: 14 
 current chromosome: 13 
 current chromosome: 15 
 current chromosome: 16 
 current chromosome: 18 
 current chromosome: 17 
 current chromosome: 19 
 current chromosome: 2 
 current chromosome: 21 
 current chromosome: 22 
 current chromosome: 20 
 current chromosome: 5 
 current chromosome: 3 
 current chromosome: 6 
 current chromosome: 4 
 current chromosome: 9 
 current chromosome: X 
 current chromosome: 7 
 current chromosome: 8 
 current chromosome: Y 
> 

Then I notided GRCh38_SNP6.tsv.gz created in my working directory; after adding chr and column names, I used output_file.tsv

> ascat.bc = maftools::prepAscat_t(t_counts = "output_file.tsv", sample_name = "RUMA_LM1")
Library sizes:
Tumor: 
Counts file: output_file.tsv
Markers: 932148
Removed 932124 duplicated loci
Error in eval(jsub, SDenv, parent.frame()) : object 'A' not found
LV19Y7325V:Downloads$ head output_file.tsv
loci    counts
chr1    629097
chr1    629241
chr1    630020
chr1    631430
chr1    785910
chr1    805477
chr1    817186
chr1    826352
chr1    829889
LV19Y7325V:Downloads$ tail output_file.tsv
chrY    56885250
chrY    56885313
chrY    56885714
chrY    56886184
chrY    56886310
chrY    56886408
chrY    56886518
chrY    56886538
chrY    56886638
chrY    56886662

Please could you help me with this?

PoisonAlien commented 1 month ago

Hi,

You should use the output file containing read counts for downstream analysis. It will be a file with the suffix _nucleotide_counts.tsv (e.g, my_sample_nucleotide_counts.tsv). This will be the input for prepAscat_t

Please read ? gtMarkers for details.

MaryGoAround commented 1 month ago

Thanks for quickly solving user's error

This worked

gtMarkers(
  t_bam = "sample.recal.bam",
  n_bam = NULL,
  build = "hg38",
  prefix = "chr",
  add = TRUE,
  mapq = 10,
  sam_flag = 1024,
  loci = NULL,
  fa = NULL,
  op = NULL,
  zerobased = FALSE,
  nthreads = 8,
  verbose = TRUE
)
MaryGoAround commented 1 month ago

Sorry for interrupt

I also used Mosdepth in you tutorial and this is the plot

tumor_cbs

Rplot11

And this is the plot from ascat

logR ASPCF

Please, could you have a look at these, and tell me if you personally seeing any duplication in chromosome 20?