cancerit / ascatNgs

Somatic copy number analysis using WGS paired end wholegenome sequencing
http://cancerit.github.io/ascatNgs/
GNU Affero General Public License v3.0
68 stars 17 forks source link

Is ascat.pl deterministic on multi-threaded systems? #117

Closed sclan closed 2 years ago

sclan commented 2 years ago

The command I used: docker run quay.io/wtsicgp/ascatngs:4.5.0 ascat.pl -r ./GRCh38.d1.vd1.fa -t ./tumor.bam -n ./normal.bam -sg ./SnpGcCorrections.tsv -pr WGS -g XX -gc Y -rs Human -pl ILLUMINA -ra GRCh38

With the same inputs I ran the analysis on two different EC2 instances with the same hardware spec (16 cpu thread) and same linux system.

There are slight differences in the output. For example:

Sample summary statistics from 1st try:

NormalContamination 0
Ploidy 3.0998005300981
rho 1
psi 2.85
goodnessOfFit 95.1865484065418
GenderChr Y
GenderChrFound N

2nd try:

NormalContamination 0
Ploidy 3.09828619832227
rho 1
psi 2.85
goodnessOfFit 95.055397549694
GenderChr Y
GenderChrFound N

I was wondering if anyone encountered this observation. The process looks to be single threaded from htop monitor during the runtime.

AndyMenzies commented 2 years ago

Hi

This looks like a question about the core ascat algorithm, ascatNgs is a wrapper to make running ascat over wgs bam files easier. The core ascat algorithm is written and supported by the Van Loo laboratory, not us.

I think this is a question better suited to the ascat repo - https://github.com/VanLoo-lab/ascat

You should get a much more definitive answer if you raise the question over there.

Andy

sclan commented 2 years ago

Thanks!

sclan commented 2 years ago

From: https://github.com/VanLoo-lab/ascat/issues/107 As you noticed in a comment that has been removed, you are using ascatNGS which is different from ASCAT. Any question/issue related to ascatNGS should be asked to their dev team through the appropriate GitHub repo. They could use some seeds to mirror BAF when processing HTS data, that's also something we do in ASCAT (ascat.getBAFsAndLogRs) but I can't answer for sure so it needs to be checked with them as we are not managing such a repo.

I went through the code that I understand (not much) and thus would like to get a confirmation that this is just a wrapper for ascat and there is no random seed introduced in the ascat.pl and its subroutines. Thanks.

AndyMenzies commented 2 years ago

Hi

The following link is to the ascat executing block in our perl wrapper.

https://github.com/cancerit/ascatNgs/blob/0d83b28d3837d1d001c1dceefcd20b71aa351225/perl/lib/Sanger/CGP/Ascat/Implement.pm#L177-L200

There are no random seeds handed through, the only things passed are reference data, sample data or intermediate data generated by previous steps in the workflow. This executes an R wrapper called runASCAT.R

runASCAR.R prepares the passed data, loads it into memory etc and sends it to the ascat core algorithm. The ascat executing block can be seen here - https://github.com/cancerit/ascatNgs/blob/0d83b28d3837d1d001c1dceefcd20b71aa351225/perl/share/ascat/runASCAT.R#L162-L193

Again, no random seeds are generated or passed though.

The version of ascat we use is sourced from - https://raw.githubusercontent.com/Crick-CancerGenomics/ascat/v2.5.1/ASCAT/R/ascat.R

Andy

sclan commented 2 years ago

Thanks for the info!