Open guidohooiveld opened 4 years ago
Ok, I will add them.
After a search, I cannot confirm that these two sequences are BGI-Seq adapters.
I will contact BGI-Seq team to get their official adapter sequences, and update fastp as well.
Great, thanks for your willingness to do this! BTW, out of curiosity, how did you check this / were not able to confirm?
I have got response from BGI team, they will send me the adapter list in a couple of days.
I will update then and release a new fastp version.
Being curious: was the BGI team able to provide the adapter sequences?
Any update on this? I also just received my first BGISeq data.
Kind reminder; I am about to receive another BGISeq data set. Thanks!
Hi, I just got the sequences from MGI. I will update the built-in adapter sequences.
I just add MGI/BGI adapter sequences to the known adapters:
knownAdapters["AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA"] = ">MGI/BGI adapter (forward)";
knownAdapters["AAGTCGGATCGTAGCCATGTCGTTCTGTGAGCCAAGGAGTTG"] = ">MGI/BGI adapter (reverse)";
Could you please try the latest build, or use the latest prebuilt binary?
If you can upload a small MGI/BGI data, I can also have a try.
Sorry for my delayed reply.
I used the latest version on Github (0.21), and compared the results obtained with the version before (0.20.1). To my surprise, both results were exactly the same. Is this expected, even though adapter trimming likely was done by BGI??
Still, I would have expected that some BGI adapters should have been found/trimmed, especially when these are specifically searched for. Thus that the results between the 2 versions should be slightly different, but not identical (at least for the number of bases trimmed due to adapters
).
Filtering result:
reads passed filter: 43562268
reads failed due to low quality: 0
reads failed due to too many N: 0
reads failed due to too short: 0
reads failed due to low complexity: 2182
reads with adapter trimmed: 2837340
bases trimmed due to adapters: 14182202
Adapter or bad ligation of read1
The input has little adapter percentage (~0.217030%), probably it's trimmed before.
Adapter or bad ligation of read2
The input has little adapter percentage (~0.217030%), probably it's trimmed before.
fastp run command:
fastp --in1 ./TEST_IN/RNA-1/RNA-1_1.fq.gz --in2 ./TEST_IN/RNA-1/RNA-1_2.fq.gz --out1=./TEST_OUT/RNA-1/RNA-1_1.fq.gz --out2=./TEST_OUT/RNA-1/RNA-1_2.fq.gz --low_complexity_filter --thread=16 --json ./TEST_OUT/RNA-1/RNA-1.fastp.json --html ./TEST_OUT/RNA-1/RNA-1.fastp.html
Since your data is paired-end, fastp can trim the adapters without adapter sequence provided. So it already worked before.
Aha, I got it. I was a little confused; I assumed that since the adapter sequence auto-detection is disabled by default for PE data, adapter detection overlap analysis would also be disabled. However, I now understand that these are 2 separate processes, and that for PE data the latter (= adapter detection by per-read overlap analysis) is always occurring (and apparently cannot be disabled). Hence, results between versions are identical...
Hi. I noticed that on the SEQanswers forum a document from BGI has been posted that lists all sequences for the oligos and primers used for BGISEQ/DNBSEQ/MGISEQ library preparation. See here for the thread (2nd post).
On page 7:
Could these 2 (or maybe all listed) sequences be added to the set of built-in adapters
fastp
uses?Thanks, Guido