a-ludi / djunctor

Close assembly gaps using long-reads with focus on correctness.
MIT License
0 stars 0 forks source link

Select "good" reads #26

Open a-ludi opened 6 years ago

a-ludi commented 6 years ago

Selecting a "good" set of reads for the filling process is crucial to the success of the project as the most important goal is correctness.

Tasks

Confidence intervals of local error rate

Given these values

L       ... #{reference bps covered by alignment chain}
t       ... trace point distance
n       ... either L if global or t if local
z       ... parameter for the confidence interval; multiplier for σ
ε_reads ... error rate [#{errors}/base pair] of reads
ε_ref   ... error rate [#{errors}/base pair] of reference

ε = (1 - ε_reads)(1 - ε_ref)

X_ε     ... #{errors in n bps with ε #{errors}/base pair}

we can approximate the distribution of X_ε

μ    = nε
σ²   = nε(1 - ε)
X_ε  ~ B(n, ε)
    ~= N(μ, σ²)  | valid if t >= 9/ε(1 - ε)