a-ludi / djunctor

Close assembly gaps using long-reads with focus on correctness.
MIT License
0 stars 0 forks source link

Testing: use two versions of a real genome #20

Open a-ludi opened 6 years ago

a-ludi commented 6 years ago

This is an idea for a near-natural testing scenario with known "ground truth". Its main benefits are: (1) near-natural challenges and (2) knowledge about the correctness of the result.

It takes these steps to generate to test data:

  1. Take to versions vX and vY, X < Y of some genome.
  2. Align vY to vX.
  3. Assess filled gaps, extended gaps and new scaffolds.
  4. Generate reads from vY and take vX to be the ground truth.

Assess the algorithm's result:

  1. Align result to vY.
  2. Assess filled gaps, extended gaps and new scaffolds. (step 3 above)
  3. Compare with former information:
    • Which gaps are correctly filled?
    • Which extensions are have correct sequence? How large are they?
    • Which scaffolding operations are correct?