Open hoelzer opened 4 months ago
Hi! Thank you for reporting this! Have you also done any art_illumina simulation by any chance? Are they also flipped? Since Swampy uses art_illumina amplicon mode for sequencing bit, there is a chance that the bug is inherited from them. Otherwise, it is about how we handle the reads later on and we need to investigate further.
Hi - thanks for your very clear report and investigation! I think this is a bug that might be inherited from art_illumina (which we're using to simulate read errors) - I'll need to investigate this a bit more just to confirm.
One short term fix - though not ideal - might be to use --seqSys MSv1 (i.e. the ART built-in profile for MiSeq version 1) - though this would have higher error rates than the version 3, which just from inspection seemed to be the right way around. If you thought those read error rates were too high, you could also pair this with using --qshift 10 to bump up the quality scores to be more comparible to the MiSeq v3.
I'll take a look at a longer term fix as well - which would be just to pass in custom (reversed) quality score profiles used by ART - and see if I can confirm that this is indeed a bug with the authors of ART, though that might take a little longer.
Hey, thanks for the fast responses and the potential workaround! Yes, we also think that's a problem in art_illumina
. We ran
art_illumina --amplicon --quiet --paired --rndSeed 10 --noALN --maskN 0 --seqSys MSv3 --in genomes.fasta --len 250 --rcount 1000000 --out out_prefix
which results in:
I'll take a look at a longer term fix as well - which would be just to pass in custom (reversed) quality score profiles used by ART - and see if I can confirm that this is indeed a bug with the authors of ART, though that might take a little longer.
Great, thanks!
I've contacted the authors of ART to see if I can confirm that this quality profile is really reversed - which would be the simplest explanation for what's happened.
If you are happy to be a guinea-pig for the above fix, which is simply to take the quality profiles and reverse them, and pass them in as custom profiles, then you could use the following parameters --seqSys custom
and --qprof1
and --qprof2
and point them to these two files (which then just get passed to ART).
MiSeqv3L250R1_reversed.txt MiSeqv3L250R2_reversed.txt
I won't make this the default behaviour until I can get confirmation about this being a real bug in ART - then I'll let you know here. But for now you can pull the latest version of the code and try again using these parameters. Hope this helps!
Great, thx @willboulton !
I forwarded that information to Justin, who also did the previous analysis I shared.
Hey,
thanks for the great simulation tool! We simulated reads using this command:
but when we check the quality pattern with FASTQC we see such profiles for the paired-end reads:
Forward read
Reverse read
Usually, the read quality drops towards the 5' end and not the 3' end (see https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). So it looks flipped? Although the general error profile looks quite good otherwise!
Did we do something wrong in the simulation, or is this a bug?
(we see the same pattern for all reads we simulated)
Thanks!