Closed Malabady closed 2 years ago
Hi,
It seems like you live-basecalled your data during sequencing. Tailfindr cannot work on live-basecalled data.
Please basecall your FAST5 file again with Guppy, and produce a new set of basecalled FAST5 files. Then use tailfindr on these newly basecalled FAST5 files and remember to now specify basecall_group = 'Basecall_1D_000' in the tailfindr command.
Best, Adnan
Hi Adnan,
I did rebasecall using Guppy in mop-preprocessing workflow (https://biocorecrg.github.io/MOP2/docs/mop_preprocess.html). Is there a way to check the fast5 files?
Thanks, Magdy
Yes, you check your rebasecalled file in HDFview software (https://www.hdfgroup.org/downloads/hdfview/). In HDFView, you can check if the these rebasecalled files have a Basecall_1D_001 group.
Hi Adnan, they seem to have the Baseball_1D_001 group, see the attached image
tailfindr should have worked then. Can you email me (adnaniazi[AT]gmail.com) one of these basecalled files. I need to check it and run tailfindr at my end to debug the issue.
Sure. These files are large and won't go through regular emails. Can you use Globus?
I dont know what Globus is but you can also use wetransfer.com to freely send the large file. I need only one of these big files.
ok. I found that we have a access to SENDFILE, which allows large files. I emailed you a ~ 400 MB fast5 file. You should receive the email shortly. I had to change the extension from fast5 to h5 to view the file on HDFview before sending. so, you can change it back to fast5 if needed. thanks for the help.
Your data seems to be working fine at my end. Can you please send me the latest error that you get after running tailfindr with basecall_group == 'Basecall_1D_001'
Hi Adnan, I also got it tailfindr to work on my data here. the run is ongoing since earlier today. it is working on all fast5 files (576), see below. Could you take a look on my command and tell me if It is sufficient or if I need to add any parameters?
``
df <- find_tails(fast5_dir = './fast5_pass/', save_dir = './out-tailfinder/', csv_filename = 'rna_tails.csv', num_cores = 24, basecall_group = 'Basecall_1D_001', save_plots = TRUE, plotting_library = 'rbokeh') ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ── Started tailfindr (version 1.3) ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ☰ You have configured tailfindr as following: fast5_dir: ./fast5_pass/ save_dir: ./out-tailfinder/ csv_filename: rna_tails.csv num_cores: 24 basecall_group: Basecall_1D_001 save_plots: TRUE plot_debug_traces: FALSE plotting_library: rbokeh ── Processing started at 2022-03-24 08:21:20 ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── • Creating a sub-directory to save the plots in. Done! All plots will be saved in the following direcotry: ./out-tailfinder//plots • Searching for all Fast5 files... Done! Found 576 Fast5 files. • Analyzing a single Fast5 file to assess if your data is in an acceptable format... ✓ The data has been basecalled using Guppy. ✓ Flipflop model was used during basecalling. ✓ The reads are packed in multi-fast5 file(s). ✓ The experiment type is RNA, so we will search for poly(A) tails. ✓ The reads are 1D reads. • Starting a parallel compute cluster... Done! • Discovering reads in the 576 multifast5 files... ``
Seems fine but you have set save_plots to true. For 576 multifast5 files thats going to take a alot of time, and all the 576*4000 plots will be saved in a single folder. Your OS will hang up when you attempt to open this folder. It is therefore recommended on run tailfindr with save_plots option only on a small subset of data just for debugging purposes.
Hi,
I am running the mop_tail workflow, which uses TailFinder. The run failed with the error posted below. I used the mop_preprocessing, which uses Guppy to rebasecall. The fast5 files given to Tailfinder are processed by the standalone Guppy, yet I am getting the following error.
Any suggestions what went wrong?
Much appreciated.