VanLoo-lab / ascat

ASCAT R package
https://www.mdanderson.org/research/departments-labs-institutes/labs/van-loo-laboratory/resources.html#ASCAT
162 stars 85 forks source link

Fix Seed? #111

Closed FriederikeHanssen closed 1 year ago

FriederikeHanssen commented 2 years ago

Hi!

We have added ASCAT to one of our pipelines and recently experienced that the output is not reproducible (on tiny test data that is) even when run in a container: https://github.com/nf-core/sarek/issues/702 We are suspecting that since the seed for aspcf is the the system time, that this is possibly the culprit.

Would there be any adverse effects, if we fixed the seed?

Cheers :)

tlesluyes commented 2 years ago

Hi @FriederikeHanssen,

Apologies for the delay getting back to you, I was off for two weeks.

There are two places where a seed can be set:

  1. During the aspcf (see the function here). It's only used for males when processing the nonPAR region on chromosome X. In this function, we subset the whole region to pick the most extreme BAF values (0/1). In practice, many SNPs will have BAF=0 or BAF=1 (because there is only 1 copy) so there is a bit of randomness to pick SNPs.
  2. When processing sequencing data (see the function here). This is because most SNPs will match the reference so there will be more data points in the 0-0.5 space of the BAF compared to the 0.5-1 space. We solve this by mirroring the BAF so half of the SNPs will be set as BAF=1-BAF. I can see that the seed can be controlled when running ascat.getBAFsAndLogRs but users should only be running ascat.prepareHTS, where the seed cannot be set so it cannot be propagated to ascat.getBAFsAndLogRs. I will add a fix in the next commit (a few days/weeks) and will let you know so you'll have full control of the seed there. This explains why, in your example, 1_801943 has a BAF of 1 in a given run and a BAF of 0 in another run.

Now, to answer your question: there should not be any adverse effect if you were to fix the seed.

Cheers,

Tom.

tlesluyes commented 2 years ago

Hi @FriederikeHanssen,

Should be fixed now (as part of this commit) so a seed can be given to ascat.prepareHTS and it will be propagated to ascat.getBAFsAndLogRs.

You can give it a try by re-installing ASCAT (devtools::install_github('VanLoo-lab/ascat/ASCAT')) and running your test again with a fixed seed for both ascat.prepareHTS and ascat.aspcf.

Cheers,

Tom.

tlesluyes commented 1 year ago

Closing the issue, feel free to share additional comments if needed.

Cheers,

Tom.

FriederikeHanssen commented 1 year ago

Hi @tlesluyes ! Apologies for not getting back to you. It has been busy. We will test this :) Thank you for adding this functionality