DecodeGenetics / Ratatosk

Hybrid error correction of long reads using colored de Bruijn graphs
BSD 2-Clause "Simplified" License
96 stars 7 forks source link

Fastq input format for short reads #28

Closed jianshu93 closed 3 years ago

jianshu93 commented 3 years ago

Hello Ratatosk Team,

I have short reads in 2 file 01_R1.fastq and 01_R2.fastq for example. How should I offer them to -s ? they have the same fastq sequence name (fastq header) for each sequence except -1 or -2 in the end of the fastq header. I do not see clear explanation.

Thanks,

Jianshu

GuillaumeHolley commented 3 years ago

Hi @jianshu93,

Reads from the same pair must have the same FASTA/FASTQ name to work with Ratatosk (first line of section Usage in the README). Hence, you must remove the -1 or -2 suffix from your FASTQ record headers. Off the top of my head, the following commands should do the trick:

awk '{LID=(NR-1)%4; if (LID==0) {print substr($0, 1, length($0)-2)} else {print $0}}' 01_R1.fastq > 01_R1_Ratatosk.fastq
awk '{LID=(NR-1)%4; if (LID==0) {print substr($0, 1, length($0)-2)} else {print $0}}' 01_R2.fastq > 01_R2_Ratatosk.fastq

Then, you can provide both files as input of Ratatosk using the -s argument (hence, -s 01_R1_Ratatosk.fastq -s 01_R2_Ratatosk.fastq).

Guillaume

jianshu93 commented 3 years ago

Thank you that's really helpful. I can now run it successfully.

Jianshu