MtbEvolution / resR_Project

2 stars 0 forks source link

Unfixed pipeline: issues with filtering #6

Closed weiju closed 7 months ago

weiju commented 8 months ago

Hi, we have some problems with the filtering in the unfixed pipeline.

When in Unfixed_SNPs_Calling/0_Unfixed_SNPs_Calling.sh the line

awk '$4>=5' merge_kept_mix_ratio.txt |awk '$6>0.6'|cut -f1|while read i;do echo $i > $i.per5up.txt;grep -w $i all_KEPT.txt|cut -f12 >> $i.per5up.txt;done

is encountered, merge_kept_mix_ratio.txt actually never has a different value other than "1" for us. So there are no *.per5up files generated. If we replace the 5 with a 1, we get files, but obviously that defeats the filtering. We noticed in lines 28-33

perl ~/script/info_mark.pl $mixfor > $mixmark;
perl ~/script/redepin_filt.pl Excluded_loci_mask.list $dep $mixmark
#filter list of highly repeated mutations with similar mutational frequency
#for those unfixed mutations that arise >=5 times in the 50K isolates, further check their reliability based on 1) the ratio in "markkept"; 2) the distribution of the mutational frequency.
cat *mixmarkkept > all_KEPT.txt; perl ~/script/loci_freq_count.pl all_KEPT.txt >kept_repeat.txt
cat *mixmark > all_MIX.txt;perl ~/script/loci_freq_count.pl all_MIX.txt > mix_repeat.txt

there are wildcards for mixmark and mixmarkkept, but we don't see that there would be multiple files of those types be generated. Could that be related to the problem we see ?

Thank you !