Closed MustafaElshani closed 1 year ago
Hi,
Can you try switching the front and end primer sequences in the command that you used and re-run tailfindr. So like this:
library(tailfindr) df <- find_tails(fast5_dir = './', save_dir = './', csv_filename = 'tails.csv', dna_datatype = 'custom-cdna', end_primer = "TCTGTTGGTGCTGATATTGC", front_primer = "TTGCCTGTCGCTCTATCTTC", num_cores = 24)
Front and end primer refer to FP and EP segments in this picture.
Best, Adnan
Hi
Thank you for your prompt reply. I have tried both to no avail almost all remain INVALID
& FALSE
. I have also tried different version of the VBZ
just in case that was causing a problem.
I have tested my environment with another kit I used before and I get TRUE
reads
I sequenced these in promethion flow cells on the new P2 SOLO, I dont assume this has anything to do with the issue.
I did not basecall using MinKNOW
which I thought was appropriate as I could do tailfindr
on those fast5
files however those files were missing both of the basecall_group
. I continued with basecalling with Guppy v6.4.2+97a7f06
and the fast5_out
which added the default Basecall_1D_000
group.
The QC is good with N1.1, PCB111.24 is identical to PCS111 so tails should be there.
Any further advise would be appreciated
Mustafa
Make a fork of the tailfindr repo (master branch), and then edit the find-dna-tailtype.R file in your forked repo. You will find it in the R folder of the forked repo.
In line 128 and 129 of this file, substitute your front and end primer sequences in place of the sequences that are already there. Save and commit the file to your forked repo. Then install tailfindr from your forked repo.
Once installed, then run tailfindr like this: library(tailfindr) df <- find_tails(fast5_dir = './', save_dir = './', csv_filename = 'tails.csv', dna_datatype = 'cdna', num_cores = 24)
With these changes, tailfindr will now search for the front and end primer in longer search windows compared to previously. This may increase the chance of finding the primers.
If this does not help then perhaps your front and end primers are too small and don't have that much discriminative power between them in the presence of Nanopore base calling errors.
I tried your suggestion same result.
The flanking sequences of PCB111.24 are the same as PCB109;
5' - ATCGCCTACCGTGAC - barcode - ACTTGCCTGTCGCTCTATCTTC - 3'
5' - ATCGCCTACCGTGAC - barcode - TTTCTGTTGGTGCTGATATTGC - 3'
The only difference here is that AC
and TT
at the 5' which have been removed in the ONT PCB111 documentation, but these are present in all of the PCB111 barcodes so I think they should remain in for PCB111.24. The tailfindr still failed to find any valid true tails in this particular fast5
.
When I run tailfindr with the above flanking sequences on a sample which I prepared with PCB109 a long time ago tails were found, I know this kit doesn't attach at the end of the tail and gives alot of false positive but found the tails nonetheless.
I tried the compress_fast5
from ont-fast5-api
just incase, again no tails.
This is driving me crazy I'm suspecting something fishy with guppy
?
Hi @adnaniazi
As I suspected guppy
was the issue when i reverted back to using 6.0.6+8a98bbc
and basecalled the same 'fast5' files with the exact same parameters I finally got tails. I have no idea what the change would have been from 6.0.6
to 6.4.2
to cause such a drastic change I did realise that they are deprecating the fast5_out
maybe it's something to do with that.
So after I got this working I tried to see what gave me the most TRUE tails.
I used the above flanking sequences and I got the following
when running tailfindr on custom-cdna
and providing sequences I get the following.
fpACTT_epTTTC TRUE tails = 771
fpTTTC_epACTT TRUE tails = 1855
when running tailfindr on default
and entering sequences in the find-dna-tailtype.R
I get the following
when fpACTT_epTTTC TRUE tails = 1406
when fpTTTC_epACTT TRUE tails = 3144
Hence I have couple questions
1) Will it be wise in thinking that the default was correct as it detected more tails than custom-cdna
?
2) From the ONT documentation it looks as if ACTT.. is the 'fp' primer. However when enter this as 'fp' it gives me a lower TRUE tails then when i enter TTTC... Just can't seem to orientate myself. Should I take the highest number of TRUE tails from tailfindr is closer to the truth?
Your help is appreciated
Mustafa
If used the protocol shown in the figure below:
then just used the default settings of tailfindr (my original tailfindr, not your forked one). This is because tailfindr should work out of the box for protocols such as SQK-PCS111 and its barcoding version SQK-PCB111.24 without you having to specify front and end primers.
So here is how I would like you to proceed:
Hope this helps.
Best, Adnan
The plot thickens indeed! Not only is the v6.0.6
is probably the last version compatible it only works with the defaults --chunk-size
, --chunks_per_runner
, '--num_callers` settings.
I optimised those parameters with RTX3090 GPU which worked but same parameters didn't work well with RTX8000, so had to use default.
Version after after 'v6.0.6' didn't work with neither of the GPU including default settings.
...and yes your default tailfindr worked fine with the PCB111. It has been round about way to this issue but it seems guppy is doing strange things!
Finally happy and will now proceed with the analysis and see how many days I need to wait to tailfind 12000 fast5.
Thank you for your help Mustafa
Dear @adnaniazi
Recently I have moved to use the SQK-PCB111.24 kit and after basecalling with
as per information here which states that PCB111 has the following flanking sequences
I decided to run tailfindr with the following
I have tried various combinations including the default to no avail the majority 98% are invalid FALSE tails
Can you see where the issue can be?