fmfi-genomika / genomikaMalGlo

Malassezia globosa
0 stars 0 forks source link

(F) Self-alignment (medium/slow needs A) #6

Closed mrshu closed 6 years ago

mrshu commented 6 years ago
lastdb genome.fa genome.fa 
lastal genome.fa genome.fa -E 1e-20 > self.maf #slow part
maf-convert psl self.maf > tmpC.psl

# filter out trivial self-alignments as well as alignments shorter than 100bp in one of the two sequences or with identity less than 0.9
perl -lane 'die unless @F==21; $s=($F[9] eq $F[13] && $F[10]==$F[12] && $F[11]==0); $s = $s || $F[12]-$F[11]<100 || $F[16]-$F[15]<100 || $F[0]<0.9*($F[0]+$F[1]+$F[2]+$F[3]+$F[5]+$F[7]); print unless $s' tmpC.psl > tmpC100_90.psl
pslToChain tmpC100_90.psl tmpC100_90.chain # kent tools binary, available on genomika
# fix bad coordinates on reverse strand 
perl -lane 'if ($F[0] eq "chain" && $F[9] eq "-") { ($F[10],$F[11]) = ($F[8]-$F[11], $F[8]-$F[10]); print join(" ", @F) } else { print }' tmpC100_90.chain > self100_90.chain

# another chain for alignments with at least 70% identity and length at least 300bp
perl -lane 'die unless @F==21; $s=($F[9] eq $F[13] && $F[10]==$F[12] && $F[11]==0); $s = $s || $F[12]-$F[11]<300 || $F[16]-$F[15]<300 || $F[0]<0.7*($F[0]+$F[1]+$F[2]+$F[3]+$F[5]+$F[7]); print unless $s' tmpC.psl > tmpC300_70.psl
/projects2/dipMag/magCap-2017/assembly/magCapA/seq-tracks/pslToChain tmpC300_70.psl tmpC300_70.chain # kent tools binary copied from genome-dev
# fix bad coordinates on reverse strand 
perl -lane 'if ($F[0] eq "chain" && $F[9] eq "-") { ($F[10],$F[11]) = ($F[8]-$F[11], $F[8]-$F[10]); print join(" ", @F) } else { print }' tmpC300_70.chain > self300_70.chain

Parts of trackDb.ra (replace magCap5 with your genome name):

track selfChain100_90
shortLabel Self aln >90%id
longLabel Self alignments with length >100bp, identity >90%
group varRep
type chain magCapA5

track selfChain300_70
shortLabel Self aln >70%id
longLabel Self alignments with length >300bp, identity >70%
group varRep
type chain magCapA5
mrshu commented 6 years ago

@pecox anything I can help with here?

pecox commented 6 years ago

I believe, this task is done. Please check genomika.compbio.fmph.uniba.sk tracks in group Variation and Repeats.

@mrshu Is there any place, where I can add my guide for this task and where I can store my script for filtering out data?

mrshu commented 6 years ago

Thanks @pecox, the best place for the guide seems to be the wiki (https://github.com/fmfi-genomika/genomikaMalGlo/wiki) -- feel free to create a new page there.

With regards to the script, I suggest you send a pull request adding it to this repository (the easiest way seems to be by uploading it via this interface: https://github.com/fmfi-genomika/genomikaMalGlo/upload/master).

Do let me know if I can be of any help!

pecox commented 6 years ago

Guide was added here: https://github.com/fmfi-genomika/genomikaMalGlo/wiki/Self-alignment Script uploaded via interface.

pecox commented 6 years ago

Added simple description.