Oshlack / splatter

Simple simulation of single-cell RNA sequencing data
GNU General Public License v3.0
213 stars 56 forks source link

Issue # 149 bug fixes #151

Closed azodichr closed 1 year ago

azodichr commented 2 years ago
lazappi commented 2 years ago

@azodichr Just checking where this is at?

lazappi commented 1 year ago

@azodichr I have made the changes I suggested. Can you please double check I haven't broken anything before I merge this?

azodichr commented 1 year ago

Hi Luke, The changes you made look good on my end. To address your question... the Sys.sleep was an attempt at fixing the problem where when nCells.sample=TRUE it was sampling the nCells for each sample from the gamma distribution, but because the seed is fixed it was sampling the same number each time.

The sys.sleep was allowing me to set a new seed for each donor using the time stamp. But I figured out a better way would be to adjust the seed for each donor by using the mean expression value for the first gene for that donor - that way it is different for each donor but will be the same each time it is run.

lazappi commented 1 year ago

To address your question... the Sys.sleep was an attempt at fixing the problem where when nCells.sample=TRUE it was sampling the nCells for each sample from the gamma distribution, but because the seed is fixed it was sampling the same number each time.

Hmmm...I'm not sure I quite understand. Do you mean that each iteration of that for loop was giving the sample number of cells (because something is weird if that is the case)? Or is there something else I'm not following (I've probably forgotten how most of this works)?

In general, I think you are probably setting the seed too many times, for normal Splat I just set it once in the main user function. We actually shouldn't be setting it at all (Bioconductor doesn't approve packages that do that anymore) but that was something I did naively when I first wrote the package and have never got around to implementing a better solution.

lazappi commented 1 year ago

@azodichr I made some changes to remove the extra set.seed() calls. I think this solves the issue you had but I might have not quite understood. I also added tests to confirm that:

  1. The same number of cells is not being selected for each sample when nCells.sample = TRUE
  2. The splatPopSimulate() seeds are reproducible (i.e. you get identical objects back)

Is there anything else you want to test or does that look ok? The deadline for the next release is next week so I will probably merge this in the next day or two if I don't hear from you.