alyssafrazee / polyester

Bioconductor package "polyester", devel version. RNA-seq read simulator.
http://biorxiv.org/content/early/2014/12/12/006015
89 stars 51 forks source link

read starting position and CIGAR #6

Open esterpantaleo opened 10 years ago

esterpantaleo commented 10 years ago

It would be very useful to also get the (ground truth) starting position and CIGAR of each simulated read (when reads are simulated from a GTF and a genome fasta file) and not just the transcript ID associated to the read. Is there a way I can print out that information?

alyssafrazee commented 10 years ago

There isn't a way to get that information at the moment, but I'll put it on our TODO list.

It's definitely possible for us to add in the position on the transcript each read came from. However, I think CIGAR is meant to be used for read alignments (e.g. it doesn't make sense to talk about "mismatches to the reference" when generating a read from the reference.) But perhaps we can find another way to indicate whether a read crossed a splice junction. Stay tuned!

esterpantaleo commented 10 years ago

sure! thank you!

alevar commented 4 years ago

I was having a similar issue and created a small tool (https://github.com/alevar/SIM2SAM) to parse output of RNA-seq simulators and convert it to "ground-truth" SAM files. The tool currently only supports single-end output of Polyester and RSEM, as we experienced several issues with the paired-end mode. However, the paired-end mode should be added shortly.