I noticed* some randomness in the output when running on identical input.
We want to track this down and make sure it's not being caused by a bug.
The first place to look is at the theta feature:
theta = 1 => take all reads, theta = 0 => take one read randomly
With theta = 1 it should produce the same numbers every time.
Verify that when theta = 1, it takes all reads.
(If that is ok, continue searching for the source of randomness.)
Randomness noticed when running on toy data:
hmm_fit/[barcode]/[barcode]-matchMismatch.csv:
The values of the last two columns "like_par1" and "like_par2" will differ slightly each of these files. (example below)
hmm_fit/[barcode]/[barcode]-hmmprob.RData:
These differ, though I couldn't check what was different.
Example for file hmm_fit/indivA12_AATAAG/indivA12_AATAAG-matchMismatch.csv:
Run 1:
ancestry contig start end par1_par2_diff like_par1 like_par2
homozygous_par2 2R 311931 2E+07 221 11 205
Run 2:
ancestry contig start end par1_par2_diff like_par1 like_par2
homozygous_par2 2R 311931 2E+07 221 11 205
Run 3:
ancestry contig start end par1_par2_diff like_par1 like_par2
homozygous_par2 2R 311931 2E+07 221 11 206
Run 4:
ancestry contig start end par1_par2_diff like_par1 like_par2
homozygous_par2 2R 311931 2E+07 221 12 204
I noticed* some randomness in the output when running on identical input.
We want to track this down and make sure it's not being caused by a bug.
The first place to look is at the theta feature:
theta = 1 => take all reads, theta = 0 => take one read randomly With theta = 1 it should produce the same numbers every time.
Verify that when theta = 1, it takes all reads.
(If that is ok, continue searching for the source of randomness.)
hmm_fit/[barcode]/[barcode]-matchMismatch.csv: The values of the last two columns "like_par1" and "like_par2" will differ slightly each of these files. (example below) hmm_fit/[barcode]/[barcode]-hmmprob.RData: These differ, though I couldn't check what was different.
Example for file hmm_fit/indivA12_AATAAG/indivA12_AATAAG-matchMismatch.csv:
Run 1: ancestry contig start end par1_par2_diff like_par1 like_par2 homozygous_par2 2R 311931 2E+07 221 11 205
Run 2: ancestry contig start end par1_par2_diff like_par1 like_par2 homozygous_par2 2R 311931 2E+07 221 11 205
Run 3: ancestry contig start end par1_par2_diff like_par1 like_par2 homozygous_par2 2R 311931 2E+07 221 11 206
Run 4: ancestry contig start end par1_par2_diff like_par1 like_par2 homozygous_par2 2R 311931 2E+07 221 12 204