Best way to predix PAF fature from gencode

Psy-Fer / SquiggleKit

SquiggleKit: A toolkit for manipulating nanopore signal data

MIT License

122 stars 23 forks source link

Best way to predix PAF fature from gencode #15

Closed callumparr closed 5 years ago

callumparr commented 5 years ago

If I have the feature from gencode for instance and wanted to pull all fast5 relating to EEF1A1-201 transcript:

Would I filter using something like

or just use the transcript symbol?

-x EEF1A1-201

I was trying to figure it from examples given but was 100% sure.

Psy-Fer commented 5 years ago

Hello,

If you have a paf file from minimap 2, it should be as simple as doing a grep on column 6 for the "Target sequence name" of what you want, then using that as your input filter file with flag -p, --paf

If you have a sam file, with say, a bed file with your choice overlaps, and the samtools view -hL selection.bed ... command, then simply extract the readIDs into a flat file using something like grep -v ^@ filtered.sam | cut -f1 > my.flat.file.txt and use the -f, --flat flag for fast5_fetcher

The -x flag is for use with the trim option for easy naming of trimmed file output.

Psy-Fer commented 5 years ago

If you would like some more specific help, let me know what files you have and are working with, and I can give you some more specific examples.

callumparr commented 5 years ago

Ah OK thank you I understand. First we should filter the fastq, or paf down and then use that to fetch the fast5.

Psy-Fer commented 5 years ago

Yep, that is correct. I'm glad that explanation helped.