Filtering parameters for input files

sq101 commented 1 year ago

Hi there! I wanted to kindly ask the following questions regarding filtering criteria for input files:

a) Sometimes control sequences are added during runs (e.g. DCS DNA control) : can such controls influence the polyA length determination? Should they be discarded from the fast5 somehow, and if so, do you have any recommendations?

b) Do you recommend any fast5 filtering to obtain more accurate polyA lengths (e.g by length, quality)? If so, which tools or approaches would you suggest?

Thank you very much for tailfindr! Regards : )

adnaniazi commented 1 year ago

Hi,

a). You can align your nanopore data to the DCS control sequence using minimap2 and generate a PAF file instead of SAM file. This file will contain information on which reads (read_id) are aligned to the DCS control sequence. Filter out these read_ids from the output csv file using dplyr functions in R.

b). No, we currently don´t have a method to filter polyA lengths by any criteria. PolyA estimates are also not point estimates, and you will most likely get a distribution of tail lengths. So if you want to estimate the polyA tail length of a particular transcript isoform, then collect at least 15-20 reads covering this isoform to get an idea about the tail length distribution.

Best, Adnan

sq101 commented 1 year ago

Hi @adnaniazi !

Thank you very much for your speedy reply and the suggestions! We'll give them a try in the future! Thank you very much : ) Regards,

adnaniazi / tailfindr

Filtering parameters for input files #54