VanLoo-lab / ascat

ASCAT R package
https://www.mdanderson.org/research/departments-labs-institutes/labs/van-loo-laboratory/resources.html#ASCAT
162 stars 85 forks source link

Problem with ascat.prepareTargetedSeq #154

Closed h170607 closed 1 year ago

h170607 commented 1 year ago

Hi developers,

To analyze a set of panel sequencing data with ASCAT, we used the following command: ascat.prepareTargetedSeq( Worksheet = "myWorksheet.tsv", Workdir= './result', alleles.prefix = "G1000_alleles_hg19_chr", BED_file = "my_targeted_design.bed", allelecounter_exe = "/PATH/TO/allelecounter", genomeVersion = "hg19", nthreads = 8)

The allele files were downloaded from 'Reference files'. Worksheet is is like this:

Patient_ID Normal_ID Normal_file Gender P226069 P226069N /data/01_Mapping/P226069N/P226069N.sort.markdup.BQSR.bam XX P226104 P226104N /data/01_Mapping/P226104N/P226104N.sort.markdup.BQSR.bam XX P226067 P226067N /data/01_Mapping/P226067N/P226067N.sort.markdup.BQSR.bam XX

BED_file is like this:

1 2488056 2488217 1 2489137 2489298 1 2489703 2489984 1 2490258 2491479 1 2492027 2492188 1 2492892 2493293

The error message is as follows:

[1] "Subsetting SNPs based on BED" [1] " Processing normal sample: P226069N (P226069; 1/2)" Reading locis Done reading locis Multi pos start: [E::sam_itr_next] Null iterator [ERROR] (src/bam_access.c: bam_access_get_multi_position_base_counts:379 errno: No such file or directory) Error detected (-2) when trying to iterate through region. [ERROR] (./src/alleleCounter.c: main:432 errno: None) Error scanning through bam file for loci list with dense snps. Error in { : task 1 failed - "EXIT_CODE == 0 is not TRUE" Calls: ascat.prepareTargetedSeq -> %dopar% ->

Could please help to resolve the problem?

Sincerely,

tlesluyes commented 1 year ago

Hi @h170607,

Can you please double-check that the BAM files sit at the expected location and indexes have been generated?

Cheers,

Tom.

h170607 commented 1 year ago

hi, tlesluyes

We analyzed this set of samples using GATK best practices. The bam files are intermediate files from GATK and the GATK pipeline has been successfully completed. Does this ensure that the bam files are okay? Also, I would like to know what files can be generated at ascat.prepareTargetedSeq step for the next step. Is there any other way to generate these files? Thanks you.

best, matt

h170607 commented 1 year ago

I processed the reference files according to https://nf-co.re/sarek/3.2.3/docs/usage#how-to-run-ascat-with-whole-exome-sequencing-data in stead of ascat.prepareTargetedSeq() step. Then proceed to the ascat.prepareHTS step.

tlesluyes commented 1 year ago

Hi @h170607,

Cheers,

Tom.

h170607 commented 1 year ago

Hi, Since running ascat.prepareTargetedSeq has been unsuccessful, we used the shell command to process the reference file according to sarek. Ascat is still the version downloaded here. Purity and ploidy can be calculated. But the results are not quite consistent with those calculated by other software. We also found that different software analysis results of purity and ploidy vary greatly. Is this normal?

tlesluyes commented 1 year ago

Hi @h170607,

Some inconsistencies are expected across different CNA callers, this is especially true for low-resolution techniques such as WES and TS. I don't know how the other callers perform on TS data, but we found that WES and TS data cannot be processed the same way so we implemented a bespoke function for TS data to have better results. I'm not sure what you mean by "vary greatly", but a 5% difference is okay whereas a sample where one caller goes for a very low purity and another caller goes for a 100% purity is indeed suspicious. In our experience, 100% purity is a warning and is likely to be very low purity (<20%), so low that the CNA caller doesn't see variations in logR/BAF tracks. The purity threshold is caller-dependant and resolution/quality of logR/BAF tracks highly depends on the methods to pick SNPs and derive tracks. Of note, PCAWG used different CNA callers so you may want to benchmark their differences in terms of purity/ploidy estimates against your findings, see whether you have a higher difference (PCAWG is WGS though).

Now closing this issue since it was related to a potential problem with the ascat.prepareTargetedSeq function but my questions to help resolve it still remain unanswered.

Cheers,

Tom.