Currently, when BackgroundRegionManager::computePandFlush() gets called, it checks to see if the estimated null region of the distribution has changed, and if so, it calls computeStats(-1). If the null region cutoff needs to be computed (!m_sliding) or recomputed (m_needToUpdate_kcutoff == true), then findCutoff() is called and then computeStats(-1) is called.
The following run case was observed:
A string of sites with count=7 exited the window to the left, while a string of sites with count=7 entered the window to the right, so that the distribution didn't change for a few bp, and then there was a huge gap (100k+ unmappable region on chr1). The gap caused computePandFlush() to be called, as it should have, but there were several high counts in the window that still needed P-values to be computed for them, and the fact that the distribution was "up to date" upon entry caused computePandFlush() to assume no further P-value calculations were needed for the window. This in turn caused P=-1 to be output for stretches of sites.
While there will be situations in which there's no need to call findCutoff() from computePandFlush(), I think we should always call computeStats(-1) from computePandFlush(), no matter what. In some cases it might be a bit of overkill, but only a bit, with a negligible cost to run time.
Currently, when BackgroundRegionManager::computePandFlush() gets called, it checks to see if the estimated null region of the distribution has changed, and if so, it calls computeStats(-1). If the null region cutoff needs to be computed (!m_sliding) or recomputed (m_needToUpdate_kcutoff == true), then findCutoff() is called and then computeStats(-1) is called.
The following run case was observed: A string of sites with count=7 exited the window to the left, while a string of sites with count=7 entered the window to the right, so that the distribution didn't change for a few bp, and then there was a huge gap (100k+ unmappable region on chr1). The gap caused computePandFlush() to be called, as it should have, but there were several high counts in the window that still needed P-values to be computed for them, and the fact that the distribution was "up to date" upon entry caused computePandFlush() to assume no further P-value calculations were needed for the window. This in turn caused P=-1 to be output for stretches of sites.
While there will be situations in which there's no need to call findCutoff() from computePandFlush(), I think we should always call computeStats(-1) from computePandFlush(), no matter what. In some cases it might be a bit of overkill, but only a bit, with a negligible cost to run time.