This is an idea for a near-natural testing scenario with known "ground truth". Its main benefits are: (1) near-natural challenges and (2) knowledge about the correctness of the result.
It takes these steps to generate to test data:
Take to versions vX and vY, X < Y of some genome.
Align vY to vX.
Assess filled gaps, extended gaps and new scaffolds.
Generate reads from vY and take vX to be the ground truth.
Assess the algorithm's result:
Align result to vY.
Assess filled gaps, extended gaps and new scaffolds. (step 3 above)
Compare with former information:
Which gaps are correctly filled?
Which extensions are have correct sequence? How large are they?
This is an idea for a near-natural testing scenario with known "ground truth". Its main benefits are: (1) near-natural challenges and (2) knowledge about the correctness of the result.
It takes these steps to generate to test data:
vX
andvY
,X < Y
of some genome.vY
tovX
.vY
and takevX
to be the ground truth.Assess the algorithm's result:
vY
.