adnaniazi / tailfindr

An R package for estimating poly(A)-tail lengths in Oxford Nanopore RNA and DNA reads.
https://www.cbu.uib.no/valen/
GNU General Public License v3.0
53 stars 16 forks source link

pod5 support? #41

Open itslittman opened 2 years ago

itslittman commented 2 years ago

will pod5 format be supported?

adnaniazi commented 2 years ago

Maybe. I have to check if it is feasible.

MustafaElshani commented 1 year ago

Hi @adnaniazi --fast5-out is now deprecated from latest guppy and the talk is that the raw signal data will be stored in POD5. Will not be switching to pod5 anytime soon unless i can find my tails.

hasindu2008 commented 1 year ago

@adnaniazi Could you get the move table from the BAM file and read the raw signal from BLOW5 format? I could provide some R bindings to slow5lib if you are interested.

One could convert FAST5 or POD5 (including all changes ONT will do to POD5, to date 3 compatibility breaking changes, or future formats) to BLOW5 and then seamlessly run your tool. We have decided to provide an updated POD5<->BLOW5 converter.

adnaniazi commented 1 year ago

Hi @hasindu2008,

Thank you for your suggestion. I will take a look at it when I get some time and ask for your help. Thanks.

Adnan

hasindu2008 commented 1 year ago

@adnaniazi

Here is an R library wrapper for accessing BLOW5 files https://github.com/hasindu2008/rslow5 that I wrote in my spare time [example: https://github.com/hasindu2008/rslow5/blob/master/demo/example.R].

I am not an R user, so the API and the structure of the data I return may not be very R-friendly.

So any comments and suggestions on how the API and the structure of data should be, from a perspective of an R user, will be useful.

In tailfindr, what are the FAST5 attributes and datasets that you are accessing at the moment?

adnaniazi commented 1 year ago

Thanks.

I fetch signal and move tables and FASTQ from the FAST5 file. As for the attributes, I use called_events, block_stride, first_sample_template from the FAST5 file.

patbohn commented 1 year ago

A quick note: now with the switch to 5 kHz sampling for Kit 14 DNA sequencing, the work-around to basecall data with guppy_6_2_1 is not working any more as it only contains the basecalling model for 4 kHz Kit 14 data, so as far as I can tell there's currently no way to run tailfindr for 5 kHz Kit 14 data.

francops1722 commented 1 year ago

Hi, any news on POD5 support??

adnaniazi commented 1 year ago

Hi, it is work in progress. Hopefully, I will have a fix fir it in 1-3 months from today.

jfallmann commented 8 months ago

Hi @adnaniazi, Thanks for your software, we'd really like to use it for Nano3P-Seq, but the switch to POD5 is breaking the workflow, any news on this request? Tried to use dorado for the same task but we'd rather like to stick to your established tool if possible and feasible for you.

adnaniazi commented 8 months ago

Hi @jfallmann,

Apologies for the late reply.

You can use fast5_rekindler (https://github.com/adnaniazi/fast5_rekindler) to combine BAM and POD5 outputs of Dorado into a basecalled FAST5 file which can then be used as input to tailfindr.

Adnan

jfallmann commented 8 months ago

Hi, no worries, thanks for the suggestion, I'll give it a try today

jfallmann commented 8 months ago

Hi @adnaniazi, Finally found time to give your approach a try and I could convert the POD5 files to FAST5 using fast5_rekindler. Unfortunately, if I know try to run tailfindR on those files I end up with an error similar to non-basecalled FAST5 files.

Error in explore_basecaller_and_fast5type(fast5_files_list[1], basecall_group = basecall_group) : 
  object 'model' not found

Also tried to run only on pass FAST5 but no change. Any ideas what I might be missing?

adnaniazi commented 8 months ago

Hi,

Did fast5_rekindler run successfully? Can you send me the log file for fast5_rekindler and one of the output fast5 files from fast5_rekindler..

jfallmann commented 8 months ago

Sorry, absolutely my mistake, the end of the log clearly states that there was an issue creating the bam index