Closed callumparr closed 5 years ago
Hello,
If you have a paf file from minimap 2, it should be as simple as doing a grep on column 6 for the "Target sequence name" of what you want, then using that as your input filter file with flag -p, --paf
If you have a sam file, with say, a bed file with your choice overlaps, and the samtools view -hL selection.bed ...
command, then simply extract the readIDs into a flat file using something like grep -v ^@ filtered.sam | cut -f1 > my.flat.file.txt
and use the -f, --flat
flag for fast5_fetcher
The -x flag is for use with the trim option for easy naming of trimmed file output.
If you would like some more specific help, let me know what files you have and are working with, and I can give you some more specific examples.
:)
Ah OK thank you I understand. First we should filter the fastq, or paf down and then use that to fetch the fast5.
Yep, that is correct. I'm glad that explanation helped.
If I have the feature from gencode for instance and wanted to pull all fast5 relating to EEF1A1-201 transcript:
ENST00000309268.10|ENSG00000156508.17|OTTHUMG00000015031.6|OTTHUMT00000128718.1|EEF1A1-201|EEF1A1|2303|protein_coding|
Would I filter using something like
-x "ENST00000309268.10|ENSG00000156508.17|OTTHUMG00000015031.6|OTTHUMT00000128718.1|EEF1A1-201|EEF1A1|2303|protein_coding|"
or just use the transcript symbol?
-x EEF1A1-201
I was trying to figure it from examples given but was 100% sure.