RobinVanSchendel / SIQ

Sequence Interrogation and Qualification
3 stars 1 forks source link

SIQ Merging a very low proportion of paired reads #11

Closed mgruet01 closed 6 months ago

mgruet01 commented 6 months ago

When I attempt to analyse paired end sequencing files a tiny proportion of the reads merge:

Screenshot 2024-03-26 at 07 16 55

I have checked that flash gives out the correct output in the troubleshooting section "SIQ does not work on files that need merging of paired-end reads" so this appears to be fine. When I run the example data there isn't an issue running the paired end files.

Using BWA_MEM2 I checked the paired mapping percentage and this was typically above 90% so I don't think there is an issue with the data.

One thing I noticed after running samples three other R1.fastq.gz files will appear:

The same does not happen for the R2.fastq.qz files. I don't know if this means SIQ isnt using the file?

Any help would be appreciated :)

RobinVanSchendel commented 6 months ago

To help you on your way I need a bit more input on your experiment and your desired output of SIQ. How large is your PCR product that you have sequenced? And how long are the reads (assuming Illumina here)?

SIQ show also UnmergedCorrectPositionFR which suggests that the reads are indeed starting at the provided primer sequences, yet they cannot be merged.

This goes to the question whether or not your reads have overlap at all. If that is not the case then perhaps you need to analyse the reads in a slightly different manner.

mgruet01 commented 6 months ago

Hey Robin, thank you for the speedy response. This was very helpful (still quite new to sequencing), it turns out the issue was with the sequencing itself. The forward reads were the expected size but the reverse reads were not. As a result the reads couldn't overlap.

Since this wasn't a SIQ specific issue should I delete the issue?

RobinVanSchendel commented 6 months ago

Hey Robin, thank you for the speedy response. This was very helpful (still quite new to sequencing), it turns out the issue was with the sequencing itself. The forward reads were the expected size but the reverse reads were not. As a result the reads couldn't overlap.

Since this wasn't a SIQ specific issue should I delete the issue?

You're welcome! No problem whatsoever. We also learned some things the hard way.

What you can also try is just use R1 for the analysis. You can for example extend your right primer (artificially) and then set 'bases past' to 0 in SIQ. That effectively removes the right primer filter. That of course is only useful if your R1 spans your target site (e.g. a CRISPR site). We also sometimes do this for certain target sites.