Closed gbouras13 closed 2 years ago
Ended up getting this to work after specifying:
dna_datatype = 'custom-cdna', front_primer = "TTTCTGTTGGTGCTGATATTGCT", end_primer = "ACTTGCCTGTCGCTCTATCTTC"
However, my fast5s are not working (the output is all NAs).
These were base called with Guppy 4.4.2 (I have used HDFView to check that they have the fastq, move and trace etc).
i.e.
guppy_basecaller -c dna_r9.4.1_450bps_fast.cfg \ --recursive \ --fast5_out \ --trim_strategy none \ --disable_pings
I have uploaded an example of 5 single fast5s
George
Hi George,
Thank you for using tailfindr. Your FAST5 files seem to be correct.
tailfindr uses specific front primer (FP) and end primer (EP) sequences to classify reads as polyA or polyT in cDNA. Currently, tailfindr uses, by default, front and end primer sequences that are used in the SQK-PCS111 kit.
So if you are not using SQK-PCS111, then you must specify the correct sequence of FP and EP for your particular kit/protocol for tailfindr to work correctly.
Please refer to this figure when specifying the FP and EP sequences. FP sequence is the sequence that is upstream the 5'-end of the mRNA-orientated cDNA strand. EP is the sequence that is downstream of the poly(T) tail in the reverse complement cDNA strand.
Feel free to let me know if you have any other questions.
Best, Adnan
Hi Adnan,
Thank you for the response so rapidly!
I believe that have specified the correct primers, but the issue seems to be more fundamental.
It seems like readfindr cannot detect the read ID for each fast5 file (logs attached).
It does not matter whether I specify the primers or not, the reads do not seem to be detected whatsoever.
2022-05-23_19-47-28_tailfinder.log cdna_tails.csv
I would assume if I specified the wrong primers that "tail_is_valid" would be FALSE (and not NA?)
George
Hi,
Thanks for the attachments.
I have tried tailfindr at my end, and I do get correct ouput. I used the following code:
library(tailfindr) df <- find_tails(fast5_dir = '/Users/adnaniazi/Downloads/zips', save_dir = '/Users/adnaniazi/Downloads/zips/output', csv_filename = 'tails.csv', save_plots = T, dna_datatype = 'custom-cdna', front_primer = "TTTCTGTTGGTGCTGATATTGCT", end_primer = "ACTTGCCTGTCGCTCTATCTTC", plot_debug_traces = T, plotting_library = 'rbokeh', num_cores = 1)
I would suggest that you uninstall tailfindr, and install it again from the master branch. Also make sure that you have the VBZ plugin installed.
I am attaching the output that I got from running tailfindr on your data.
Absolute legend thank you Adnan!
Good to know it's my environment and not the reads.
George
My pleasure I am closing the issue now. If you run into any other problems then just let me know.
Best, Adnan
Using the VBZ-plugin v 1.0.1 solved my issue (https://github.com/nanoporetech/vbz_compression/releases/tag/v1.0.1) - I was using v 1.0.2 prior to your comment above, and it did not work!
So something to note for anyone else suffering the same problem.
George
Hi,
I am trying to run tailfindr on some SQK-DCS109 data.
Before I did that, I tried to run the tutorial example:
df <- find_tails(fast5_dir = system.file('extdata', 'cdna', package = 'tailfindr'), save_dir = '~/Downloads', csv_filename = 'cdna_tails.csv', num_cores = 2, save_plots = TRUE, plotting_library = 'rbokeh')
but this outputted a csv as follows full of NAs:
cdna_tails.csv 2022-05-23_12-29-05_tailfinder.log
And also no plot (unsurprisingly).
The RNA example in the tutorial works fine for what It is worth, including when I enable plots (the plots look perfect).
Any idea what could be going wrong?
George