adnaniazi / tailfindr

An R package for estimating poly(A)-tail lengths in Oxford Nanopore RNA and DNA reads.
https://www.cbu.uib.no/valen/
GNU General Public License v3.0
48 stars 15 forks source link

Empty rna_tails.csv #61

Closed Aimepicornell closed 9 months ago

Aimepicornell commented 9 months ago

Dear adnaniazi,

i am so close to finishing my analysis! Unfortunately i ran into some difficulties just before the finishing line. I would greatly apreciate your help once again:

Here is the content of my rna_tails.csv: 1 2 and 3 are your example RNA Fast5 Files. They seem to work perfectly fine. The last file is mine and it seems to have no information exept of the file location.

read_id,tail_start,tail_end,samples_per_nt,tail_length,file_path 0ae5f030-3d88-4c57-9c0e-c2bf5dbb5901,9063,10788,27.98,56.65,/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.1/tailfindr/extdata/rna/1.fast5 5af5c783-a82a-41fe-bf67-e4be5f737ba2,14061,17111,37.15,77.11,/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.1/tailfindr/extdata/rna/2.fast5 0b063214-930d-4522-8f5d-59d1a55cca83,9969,12194,34.47,59.55,/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.1/tailfindr/extdata/rna/3.fast5 ,,,,,/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.1/tailfindr/extdata/rna/Fast5BasecalledDEMF.fast5

I basecalled my files using Guppy 6.2.1 (Following your advice) with the command: "C:\bla\bla\guppy_basecaller.exe" --input_path "C:\blabla" --save_path "C:\Change\Output " --num_callers 8 --config rna_r9.4.1_70bps_hac.cfg --fast5_out

Is there anything i messed up? I simply used one of my fast5 files in the initial Fast5 Pass folder.

Other than that i ge tthis at the end opf the tailfindr run: ....... ── Processing ended at 2023-09-13 12:32:55 ───────────────────────────────────────────── ✔ tailfindr finished successfully! Warning message: cols is now required when using unnest(). ℹ Please use cols = c(read_id, tail_start, tail_end, samples_per_nt, tail_length, polya_fastq, file_path).

But since your example files seem to work i dont think it should be an issue?

Thank you in advance

Aimé Picornell

adnaniazi commented 9 months ago

These reads with missing information are reads that fatally crash tailfindr because tailfindr most probably could not a tail-like region in the signal. So you can just filter out these reads from the csv file and just analyze the reads that do have tail prediction.

Aimepicornell commented 9 months ago

The ones that work are the fast5 that are provided by you in the extdata folder (The example files) . My fast5 files dont seem to work properly! I only get a rna_tails.cs filled with 4000 entries like this:

,,,,,/media/ubuntu/c087a6db-c469-4074-a343-9d608f6b2274/Aime/AnalyzeFAAR_09_2023/Tailfindr/Fast5BasecalledDEMF.fast5 ,,,,,/media/ubuntu/c087a6db-c469-4074-a343-9d608f6b2274/Aime/AnalyzeFAAR_09_2023/Tailfindr/Fast5BasecalledDEMF.fast5 ,,,,,/media/ubuntu/c087a6db-c469-4074-a343-9d608f6b2274/Aime/AnalyzeFAAR_09_2023/Tailfindr/Fast5BasecalledDEMF.fast5 ,,,,,/media/ubuntu/c087a6db-c469-4074-a343-9d608f6b2274/Aime/AnalyzeFAAR_09_2023/Tailfindr/Fast5BasecalledDEMF.fast5

Aimepicornell commented 9 months ago

Unbenannt

Is it possible that my Fast5 File is not correctly analyzed? I used the fast5 file generated in the workspace folder next to the raw fast5 file.

Edit/Update: the View with HDF is now working Unbenannt

Here How it looks inside "R" Unbenannt

Aimepicornell commented 9 months ago

Setting the HDF5 Path was the solution...