Victorian-Bioinformatics-Consortium / nesoni

High throughput sequencing analysis tools
GNU General Public License v2.0
30 stars 10 forks source link

Nesoni clip: is discarding sequence descriptions #3

Closed tseemann closed 11 years ago

tseemann commented 11 years ago

Illumina now produces FASTQ without the /1 and /2, and instead uses a fasta description (ID, space, DESC).

@M00855:4:000000000-A16FH:1:1101:14529:1450 2:N:0:1 TGGGCAGCAGCGACTTCTGCCACAGTGTCGGTGACATGCCAAACGGTGGGT

When passing through the clip: tool, the DESC is discarde

d@M00855:4:000000000-A16FH:1:1101:14529:1450 GGGCAGCCTCAGCGCCCCGATGGGCGGAATGGGCCTGTCGGGCGT

ie. the "2:N:0:1" is missing.

pfh commented 11 years ago

Description will now be retained by "clip:".

However, many tools in nesoni and outside of nesoni expect reads to be uniquely named (unless they are explicitly treated as pairs).

nesoni.io.check_name_uniqueness can be used to check input. This is currently used by "shrimp:" and now also by "clip:".