Open Flying-Doggy opened 8 months ago
for example, target adapter 'GTGAGTGATGGTTGAGGTAGTGTGGAG' is located at the 5' of the E200012434L1C001R00100009337 read2. And it's expected to get trimmed reads2 like 'CGGGGTTATAGTGTGAGATTTTGTTTTAAGAATAAAAAAATTTTAAAATAAGATAATTTTATTTTTATATAAATTATTTTAGAGTATAATAAAAGGAAAATTTTTAAATTTATTATATAAAGT', but the fastp trimmed the whole E200012434L1C001R00100009337 read2. ---input---
@E200012434L1C001R00100009337/1
ACAACAAACCGAAATCGCGCCACTACACTCCAACCTAAAAAACAACGAAACTCCGTCTCAAAAAAAAAACAAAAAACAAAAAATTAAAATAAAACCAACTTTATATAATAAATTTAAAAATTTTCCTTTTATTATACTCTAAAATAATTT
+
GFGGFGGGFFGFGGGFGFGFGGFGGFGFGFGGGFGGGGGGGGFGGGGGGGGGGFGFGGGFGGGGGGGGGGGGGGGGFGGGGGGFGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGBFGGGGFGGGGGFFFGGGGGGFGCGG
**@E200012434L1C001R00100009337/2**
GTGAGTGATGGTTGAGGTAGTGTGGAGCGGGGTTATAGTGTGAGATTTTGTTTTAAGAATAAAAAAATTTTAAAATAAGATAATTTTATTTTTATATAAATTATTTTAGAGTATAATAAAAGGAAAATTTTTAAATTTATTATATAAAGT
+
FEFFFFFF?GFFFFFFFFFFFFCGFFFEGFFFFFGEGFFFFFFFF>FFFF=FGGGFGFFFFFFFFGFFCFFGFGFFFFGF>GFEFDFGFEFFGFEFFGAFFFG?FFFFFFF>G>GFFGGFGFDFFF.#E@CCFF?<FFGFFGGGFGFGFF
----fastp_output----
@E200012434L1C001R00100009337/1 paired_read_is_failing
ACAACAAACCGAAATCGCGCCACTACACTCCAACCTAAAAAACAACGAAACTCCGTCTCAAAAAAAAAACAAAAAACAAAAAATTAAAATAAAACCAACTTTATATAATAAATTTAAAAATTTTCCTTTTATTATACTCTAAAATAATTT
+
GFGGFGGGFFGFGGGFGFGFGGFGGFGFGFGGGFGGGGGGGGFGGGGGGGGGGFGFGGGFGGGGGGGGGGGGGGGGFGGGGGGFGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGBFGGGGFGGGGGFFFGGGGGGFGCGG
**@E200012434L1C001R00100009337/2 failed_too_short**
+
The r2 reads is specified with GTGAGTGATGGTTGAGGTAGTGTGGAG at 5'. So I use
--adapter_sequence_r2 GTGAGTGATGGTTGAGGTAGTGTGGAG
to identify the valid reads, and the command is shown below:The result suggests only 1/6 reads(~6000000, R1+R2) remains, 5/6 reads were filtered because of short length.
Then, I use grep to sum the number of reads with GTGAGTGATGGTTGAGGTAGTGTGGAG at R2 head and find 17606348 R2 reads are start with valid adapter. So I think there might be other reason caused missing reads.
To discovery which factor caused low filtered reads, I removed
--adapter_sequence_r2 GTGAGTGATGGTTGAGGTAGTGTGGAG
and ran fastp again. However, much more reads were acquired.I want to know why many valid reads miss after adding
--adapter_sequence_r2 GTGAGTGATGGTTGAGGTAGTGTGGAG
. And in the output clean data, many reads still have adapter sequence.