bcgsc / NanoSim

Nanopore sequence read simulator
Other
217 stars 51 forks source link

Feature request: reporting SAM file for simulated reads #54

Closed seryrzu closed 4 years ago

seryrzu commented 5 years ago

It would be really helpful to report the true alignment of reads to the reference in SAM format. For example, Simlord does for simulating PacBio reads.

SaberHQ commented 5 years ago

Hello @seryrzu

If I am not wrong, you are looking for the alignment of simulated reads to the reference genome in SAM format? If that is the case, then you can simply align them using minimap2 with the -a option. I couldnt undersdtand what to you mean by true alignment though.

seryrzu commented 5 years ago

I can technically align reads but in case of read originating from repetitive part of the genome the alignment produced with minimap2 or other tool can be wrong. Art Illumina (for short reads) and Simlord (for PacBio reads) provide a SAM file with the alignment that corresponds to the true origin of the read. Having these files facilitates downstream analysis because I don't have to think of potential flaws of alignment when I'm benchmarking something else and am using alignment as ground truth.

cheny19 commented 5 years ago

This feature has been raised long ago by several users. We haven't implemented yet because we feel it is kind of redundant as to what we already have. Actually, the headers of simulated reads have suggested where the true alignment would start, and the error_profile contains the location and type of introduced errors on each read. These two should be sufficient to trace the original reference sequence.