JaneliaSciComp / msg

Multiplexed Shotgun Genotyping
http://genomics.princeton.edu/AndolfattoLab/MSG.html
11 stars 12 forks source link

Investigate (and possibly fix) source of stochasticity #11

Closed gregpinero closed 12 years ago

gregpinero commented 13 years ago

I noticed* some randomness in the output when running on identical input.

We want to track this down and make sure it's not being caused by a bug.

The first place to look is at the theta feature:

theta = 1 => take all reads, theta = 0 => take one read randomly With theta = 1 it should produce the same numbers every time.

Verify that when theta = 1, it takes all reads.

(If that is ok, continue searching for the source of randomness.)

hmm_fit/[barcode]/[barcode]-matchMismatch.csv: The values of the last two columns "like_par1" and "like_par2" will differ slightly each of these files. (example below) hmm_fit/[barcode]/[barcode]-hmmprob.RData: These differ, though I couldn't check what was different.

Example for file hmm_fit/indivA12_AATAAG/indivA12_AATAAG-matchMismatch.csv:

Run 1: ancestry contig start end par1_par2_diff like_par1 like_par2 homozygous_par2 2R 311931 2E+07 221 11 205

Run 2: ancestry contig start end par1_par2_diff like_par1 like_par2 homozygous_par2 2R 311931 2E+07 221 11 205

Run 3: ancestry contig start end par1_par2_diff like_par1 like_par2 homozygous_par2 2R 311931 2E+07 221 11 206

Run 4: ancestry contig start end par1_par2_diff like_par1 like_par2 homozygous_par2 2R 311931 2E+07 221 12 204