Psy-Fer / SquiggleKit

SquiggleKit: A toolkit for manipulating nanopore signal data
MIT License
120 stars 23 forks source link

Using MotifSeq coordinates within single reads to segment fast5 file at those positions. #44

Open gallardo-seq opened 3 years ago

gallardo-seq commented 3 years ago

Thanks for developing this great tool. This is an enhancement/question type issue. We have some CCS-type reads that contain a repetitive unit that we can search for using MotifSeq. My question is whether SquiggleKit can output the positions of the repetitive unit within single reads, and then use these coordinates to segment each fast5 file (kind of like Porechop, but at the signals level). Our ultimate goal is to do pair consensus decoding with Bonito specifically, or facilitate multidimensional basecalling in general. Does SquiggleKit already have similar functionality that I'm perhaps missing by in the documentation?

Psy-Fer commented 3 years ago

Hey,

So you want the base positions of what you have found in the signal? Or the signal positions of what you found in the basecall?

If either of those is what you want, you are in luck as we have an upgrade being worked on at the moment for doing this with a new library I built along with Hasindu in our lab.

Happy to make this one of the use case examples.

I'll talk to the people involved this week week and see if I can get it moved forward.

Currently motifseq will give you the signal positions where it finds something. Then you would have to cut the signal at those sites from the array. Probably not as useful as the new method we have.

James

gallardo-seq commented 3 years ago

Thanks for getting back to me so quickly. To clarify, I have a list of positions for each basecall/fastq file that I want to use to segment the originating signal/fast5. Specifically, each read is a concatemer containing a single sequence that is repeated over and over (like CCS in PacBio). We already use these repetitions to obtain error-corrected reads in the base space, but with the release of a pair-consensus decoding option for bonito (and multi-dimensional basecalling in the works for ONT in general), I think using these repetitive units for signal space error correction would be a timely development.

That is quite exciting about the upgrade that you have in the works. From your description it sounds like it'd be highly complementary to what we are looking for, I have some test fast5 and fastq files that I'd be happy to share as a case use example or for development purposes (though I'd be happy to coordinate over e-mail or another more suitable medium). Let me know if the people involved in your collaboration are interested in moving forward on this angle.

Psy-Fer commented 3 years ago

Hey,

That actually sounds rad.

Wanna send me an email at j.ferguson[at]garvan.org.au ?

I think this would be worth including in the development to ensure we deliver in a way that would make something like this works. What you need sounds exactly like what we have made, so I think this could work out well.

Talk soon.

James

gallardo-seq commented 3 years ago

Hi, I sent you an e-mail to get the ball rolling. Talk soon. CG

Psy-Fer commented 3 years ago

Hey,

Yep, i got it. I have been talking with the relevant people. Looks like we will be going ahead. I'll be in touch soon.