adnaniazi / tailfindr

An R package for estimating poly(A)-tail lengths in Oxford Nanopore RNA and DNA reads.
https://www.cbu.uib.no/valen/
GNU General Public License v3.0
53 stars 16 forks source link

cdna example fast5 not working #28

Closed gbouras13 closed 2 years ago

gbouras13 commented 2 years ago

Hi,

I am trying to run tailfindr on some SQK-DCS109 data.

Before I did that, I tried to run the tutorial example:

df <- find_tails(fast5_dir = system.file('extdata', 'cdna', package = 'tailfindr'), save_dir = '~/Downloads', csv_filename = 'cdna_tails.csv', num_cores = 2, save_plots = TRUE, plotting_library = 'rbokeh')

but this outputted a csv as follows full of NAs:

cdna_tails.csv 2022-05-23_12-29-05_tailfinder.log

And also no plot (unsurprisingly).

The RNA example in the tutorial works fine for what It is worth, including when I enable plots (the plots look perfect).

Any idea what could be going wrong?

George

gbouras13 commented 2 years ago

Ended up getting this to work after specifying:

dna_datatype = 'custom-cdna', front_primer = "TTTCTGTTGGTGCTGATATTGCT", end_primer = "ACTTGCCTGTCGCTCTATCTTC"

However, my fast5s are not working (the output is all NAs).

These were base called with Guppy 4.4.2 (I have used HDFView to check that they have the fastq, move and trace etc).

i.e.

guppy_basecaller -c dna_r9.4.1_450bps_fast.cfg \ --recursive \ --fast5_out \ --trim_strategy none \ --disable_pings

I have uploaded an example of 5 single fast5s

fast5_example.zip

George

adnaniazi commented 2 years ago

Hi George,

Thank you for using tailfindr. Your FAST5 files seem to be correct.

tailfindr uses specific front primer (FP) and end primer (EP) sequences to classify reads as polyA or polyT in cDNA. Currently, tailfindr uses, by default, front and end primer sequences that are used in the SQK-PCS111 kit.

So if you are not using SQK-PCS111, then you must specify the correct sequence of FP and EP for your particular kit/protocol for tailfindr to work correctly.

Please refer to this figure when specifying the FP and EP sequences. FP sequence is the sequence that is upstream the 5'-end of the mRNA-orientated cDNA strand. EP is the sequence that is downstream of the poly(T) tail in the reverse complement cDNA strand.

Feel free to let me know if you have any other questions.

Best, Adnan

gbouras13 commented 2 years ago

Hi Adnan,

Thank you for the response so rapidly!

I believe that have specified the correct primers, but the issue seems to be more fundamental.

It seems like readfindr cannot detect the read ID for each fast5 file (logs attached).

It does not matter whether I specify the primers or not, the reads do not seem to be detected whatsoever.

2022-05-23_19-47-28_tailfinder.log cdna_tails.csv

I would assume if I specified the wrong primers that "tail_is_valid" would be FALSE (and not NA?)

George

adnaniazi commented 2 years ago

Hi,

Thanks for the attachments.

I have tried tailfindr at my end, and I do get correct ouput. I used the following code:

library(tailfindr) df <- find_tails(fast5_dir = '/Users/adnaniazi/Downloads/zips', save_dir = '/Users/adnaniazi/Downloads/zips/output', csv_filename = 'tails.csv', save_plots = T, dna_datatype = 'custom-cdna', front_primer = "TTTCTGTTGGTGCTGATATTGCT", end_primer = "ACTTGCCTGTCGCTCTATCTTC", plot_debug_traces = T, plotting_library = 'rbokeh', num_cores = 1)

I would suggest that you uninstall tailfindr, and install it again from the master branch. Also make sure that you have the VBZ plugin installed.

I am attaching the output that I got from running tailfindr on your data.

output.zip

gbouras13 commented 2 years ago

Absolute legend thank you Adnan!

Good to know it's my environment and not the reads.

George

adnaniazi commented 2 years ago

My pleasure I am closing the issue now. If you run into any other problems then just let me know.

Best, Adnan

gbouras13 commented 2 years ago

Using the VBZ-plugin v 1.0.1 solved my issue (https://github.com/nanoporetech/vbz_compression/releases/tag/v1.0.1) - I was using v 1.0.2 prior to your comment above, and it did not work!

So something to note for anyone else suffering the same problem.

George