Open ufaroooq opened 3 months ago
In response to your last question, yes these are all just different ways to chain the steps together.
In your example:
# Index the draft genome assembly
samtools faidx ref.fa
# Map ONT reads to reference with minimap2
# Filter out secondary alignments with samtools
# Find reads that overhang contig ends AND contain TTAGGG motif with teloclip
# Write the overhang reads to fasta files with teloclip-extract
minimap2 -ax map-ont ref.fa nanopore.fq | samtools view -h -F 0x100 | teloclip --ref ref.fa.fai --motifs TTAGGG | teloclip-extract --refIdx ref.fa.fai --extractReads --extractDir SplitOverhangs
This will create a dir called SplitOverhangs
that has one fasta file of overhang reads for each of your contigs/chromosomes. The overhanging section of each read will be in lower case. You can manually select the best alignment (long high identity anchor on the contig end + longest run of telomeric repeats in the overhang) for each contig and append the overhang.
Chaining all the steps together like this means that you do not write any of the intermediate files to disk.
However, I generally recommend that you run the initial teloclip step with and without the --motifs
filter and manually inspect the overhanging alignments in IGV.
You will see that there are different kinds of overhangs depending on how complete your assembly is.
--motifs
option and manually select the overhangs that contain telomeres.Re the --fuzzy
option this is still in development and will replace the --noPoly
option. Ignore it for now.
Please use the stable release that is on PyPi via pip install teloclip
Dear Adam,
Thank you for your detailed responce. It clerifies alot of things.
Dear Adam,
I hope you are doing well. I just got to know about this tool to find telomers in assembled genomes. I Am trying to run this on one of my assembled genomes but the documentation in readme file is confusing me. I also checked a previously opened issue #20 by @cyycyj but stat also increased some confusions so I will try to put all confusions in one go so yu can help. As from the Documentation I see 3 major steps to follow
Step1: First index the reference assembly
Step2: Streaming SAM records from aligner
Step by step as mentioned by @cyycyj in #20
Step 3: Report clipped alignments containing target motifs
Step by step as mentioned by @cyycyj in #20
QUESTION: This
in.bam
is causig confusion. as @cyycyj had asked already in #20in.bam
will be the output from step 2 being theFILTERED.SAM (after sam2bam conversion)
file ?in.bam
will be the final output from step 2 being theTELOCLIP_FILTERED.BAM
file ?QUESTION regarding **Matching noisy target motifs*** when using the
--fuzzy
option with teloclip, i get errorteloclip: error: unrecognized arguments: --fuzzy
Step 4: Extract clipped reads
QUESTION: here the
in.bam
will be which of the following ?in.bam
will be the output from step 2 being theFILTERED.SAM (after sam2bam conversion)
file ?in.bam
will be the final output from step 2 being theTELOCLIP_FILTERED.BAM
file ?in.bam
will be the final output from step 3 being theTELOCLIP_TTAGGG_OVERHANGS.BAM
file ?Major Confusions 1 in #20 you answered as
" the
bam
orsam
that gets passed to teloclip should always be either raw aligments from an alignment tool likeminimap2
OR alignments that have been filtered withsamtools view
to remove low quality alignments."This is point of confusion, So in both step 2,3,4 the
in.bam
file should be the either the raw alignment filemapped.sam
or filteres alignment fileFILTERED.SAM
which are generated instep2
. can you please shed some light on this.Major Confusions 2
all these steps should be executed to run teloclip ??
OR step 2 3 4 are just different ways to run teloclip and can be combined into one step as below ?
If you are able to help with the mwntioned questions it will be great help. Best regards