catavallejos / BASiCS

BASiCS: Bayesian Analysis of Single-Cell Sequencing Data. This is an unstable experimental version. Please see http://bioconductor.org/packages/BASiCS/ for the official release version
83 stars 16 forks source link

Component missing in log_aux in Metropolis-Hastings? #222

Closed lilythepooh closed 3 years ago

lilythepooh commented 3 years ago

Dear BASiCS team,

Thanks so much for reading this question.

I am reading the Metropolis-Hastings code in /src/updates.h, and I have a few questions.

  1. Line 37-47 looks like it is from equation (56) in Eling et al. (2018). Maybe I got it wrong here, but I thought there should be another +Counts(i,j)log(mu0(i))-Counts(i,j)log(mu1(i)) component? (and it should be in a i++ j++ loop as well.)
  2. Is there any reason to use phinu rather than nu in the update of mu and delta? (The corresponding parameter in equation (56)-(57) is nu)

I would appreciate it very much if you could answer these questions. I learned a lot from this amazing research. Thank you again for your time!

Best regards, Sijia Li

alanocallaghan commented 3 years ago
  1. I'm not sure why you think this factor is missing, could you elaborate? Perhaps point to the component of the likelihood or the prior that this would correspond to.
  2. The reason we use phinu rather than nu is because this component of the NB likelihood involves (phi_j nu_j mu_i); that is to say the scale normalising factor phi for cell j, times the capture efficiency normalising factor nu for cell j, times the mean expression factor mu for gene i. Since we need to use phi_j nu_j for each gene in turn, we compute the elementwise product of phi nu ahead of time just for a little bit of computational efficiency.

Hope that helps! All the best Alan

lilythepooh commented 3 years ago
  1. I'm not sure why you think this factor is missing, could you elaborate? Perhaps point to the component of the likelihood or the prior that this would correspond to.
  2. The reason we use phinu rather than nu is because this component of the NB likelihood involves (phi_j nu_j mu_i); that is to say the scale normalising factor phi for cell j, times the capture efficiency normalising factor nu for cell j, times the mean expression factor mu for gene i. Since we need to use phi_j nu_j for each gene in turn, we compute the elementwise product of phi nu ahead of time just for a little bit of computational efficiency.

Hope that helps! All the best Alan

Dear Alan,

Thanks so much for your early reply.

  1. In Eling et al. 2018, equation 56, on the right-hand side, the top of the first fraction is μ_i^{Σ_kΣ_j x_{ijk}}, which after log would be:
(Σ_kΣ_j x_{ijk})*log(μ_i)

Other components in equation 56 have corresponding lines in the code:

1) For 1/[Π_kΠ_j(ν_{jk}+1/δ_i)^{x_{ijk}+1/δ_i}], take log we have

-Σ_kΣ_j(x_{ijk}+1/δ_i)*log(ν_{jk}+1/δ_i), 

which corresponds to line 41-47 in updates.h:

for (int i=0; i < q0; i++) {
    for (int j=0; j < n; j++) {
      log_aux(i) -= ( Counts(i,j) + invdelta(i) ) *  
        log( ( phinu(j)*mu1(i) + invdelta(i) ) / 
        ( phinu(j)*mu0(i) + invdelta(i) ));
    }
  }

2) For exp{-(log(μ_i))^2/2a^2}, take log we have:

-(log(μ_i))^2/2a^2

which corresponds to line 37-39 in updates.h:

 log_aux -= (0.5 / s2_mu) * 
    (pow(log(mu1) - mu_mu, 2) - pow(log(mu0) - mu_mu, 2))
    * exponent;

3) For 1/μ_i, take log we have

-log(μ_i)

which corresponds to line 36 in updates.h:

log_aux = (log(mu1) - log(mu0)

Therefore, I am wondering if I missed reading component (Σ_kΣ_j x_{ijk})*log(μ_i) or Counts(i,j)*log(mu0(i)/mu1(i))somewhere in "updating mu" of updates.h?

  1. I see. I suppose the parameter Φ was missing from equation 54-62 of Eling et al. 2018 due to some typo? It does make sense to calculate phinu, thanks a lot!

Thanks again for your time. Have a good day!

Best regards, Sijia Li

P. S. I am so sorry for the weird math equations. I hope GitHub would support LaTeX in these comments soon.

alanocallaghan commented 3 years ago

I'll review the first half, it's a bit tricky going between the paper, this thread, and the code.

Per the omission of phi, the code in src/updates.h corresponds to the original implementation of BASiCS which includes both of these per-cell normalising factors. In this case equation s3 here.

Equations 55 and 56 represent the horizontal integration approach, and are implemented in src/updatesNoSpikes.h. Similarly eq 61 is implemented in src/updatesRegNoSpikes.h.

alanocallaghan commented 3 years ago

Ah I see. The component you're referencing is (log(mu1) - log(mu0)) % sum_bycell_bio;, where sum_bycell_bio is \sum_{k=1}^k\sum_{j=1}^{n_k} X_{ijk}. Because the count matrix does not change, we compute these ahead of time as well.

alanocallaghan commented 3 years ago

This prior is slightly different in the different implementations but this component is consistent across each if I remember well. Does that cover everything?

lilythepooh commented 3 years ago

This prior is slightly different in the different implementations but this component is consistent across each if I remember well. Does that cover everything?

Dear Alan,

Thanks so much for your detailed and patient reply. It is all clear now.

May I ask a last question please? Does line 94-106 in https://github.com/catavallejos/BASiCS/blob/effdd004f5467306873793cedf753b14e25a206a/src/updatesReg.h follow the original implementation in equation s3 you mentioned as well? In other words, all the dataset with spike-in genes is implemented with original implementations? If "Regression=T" then extra regression steps are added?

line 94-106 in updatesReg.h:

 arma::vec log_aux = (log(mu1) - log(mu0)) % sum_bycell_bio; 
  log_aux -= (0.5 / s2_mu) *
    (pow(log(mu1) - mu_mu, 2) - pow(log(mu0) - mu_mu, 2)) * exponent;
  #pragma omp parallel for
  for (int i = 0; i < q0; i++) {
    for (int j = 0; j < n; j++) {
      log_aux(i) -= ( Counts(i,j) + 1/delta(i) ) *  
        log(
          (phinu(j) * mu1(i) + 1 / delta(i)) / 
          (phinu(j) * mu0(i) + 1 / delta(i))
        );
    }
  }

Thanks again for your patience and time. Have a good day!

Best regards, Sijia Li

alanocallaghan commented 3 years ago

src/updatesReg.h contains the implementation of conditionals from from Eling et al, namely eqs 18 (designMatrix) and 21-25 (muUpdateReg, deltaUpdateReg, and betaUpdateReg, sigma2UpdateReg, and lambdaUpdateReg). Except for muUpdateReg and deltaUpdateReg, these are also used when running BASiCS with Regression=TRUE, WithSpikes=FALSE

lilythepooh commented 3 years ago

src/updatesReg.h contains the implementation of conditionals from from Eling et al, namely eqs 18 (designMatrix) and 21-25 (muUpdateReg, deltaUpdateReg, and betaUpdateReg, sigma2UpdateReg, and lambdaUpdateReg). Except for muUpdateReg and deltaUpdateReg, these are also used when running BASiCS with Regression=TRUE, WithSpikes=FALSE

Dear Alan,

Thanks a lot for your early reply. It is all clear now.

I am very grateful for your time. Have a nice evening!

Best regards, Sijia

alanocallaghan commented 3 years ago

Glad I could help! You too, all the best :)