MtbEvolution / resR_Project

2 stars 0 forks source link

Unfixed pipeline script Unfixed_SNP_calling.sh: sample-specific and global parts ? #7

Open weiju opened 7 months ago

weiju commented 7 months ago

Hi, while trying to reproduce the published results for the Unfixed pipeline, we are coming across some parts that don't quite work when running the script Unfixed_SNP_calling.sh on individual samples. There is this code section in the script

#filter list of highly repeated mutations with similar mutational frequency
#for those unfixed mutations that arise >=5 times in the 50K isolates, further check their reliability based on 1) the ratio in "markkept"; 2) the distribution of the mutational frequency.
cat *mixmarkkept > all_KEPT.txt; perl ~/script/loci_freq_count.pl all_KEPT.txt >kept_repeat.txt
cat *mixmark > all_MIX.txt;perl ~/script/loci_freq_count.pl all_MIX.txt > mix_repeat.txt
perl ~/script/repeat_number_merge.pl mix_repeat.txt kept_repeat.txt > merge_kept_mix.txt
perl ~/script/ratio.pl merge_kept_mix.txt > merge_kept_mix_ratio.txt
awk '$4>=5' merge_kept_mix_ratio.txt |awk '$6>0.6'|cut -f1|while read i;do echo $i > $i.per5up.txt;grep -w $i all_KEPT.txt|cut -f12 >> $i.per5up.txt;done
paste *per5up.txt > 5up_0.6_paste.txt
perl ~/script/stdv.pl 5up_0.6_paste.txt |awk '$2<0.25'|cut -f1 > 5up_0.6_0.25.list
perl ~/script/freq_extract.pl 5up_0.6_0.25.list 5up_0.6_paste.txt > 5up_0.6_0.25.txt
awk '$4>=5' merge_kept_mix_ratio.txt|cut -f1 > 5up.list
perl ~/script/repeat_loci.pl 5up_0.6_0.25.list 5up.list > 5up_remove_loc.list

and if we try to run this on a per-sample basis, the script will fail after the first awk line. In our opinion this part only makes sense if it is run on the results of the entire set of isolates, so this script has both per-sample parts and per all sample parts ? This part

perl ~/script/unfix_scripts/repeatloci_filter.pl 5up_remove_loc.list $markkept > $keptfilt
#annotation of unfixed SNPs
cut -f9-11 $keptfilt > $keptsnp
perl ~/script/1_MTBC_Annotation_mtbc_4411532.pl $keptsnp > $keptanofilt

seems to be per individual sample again ? Would that happen to be the case or are we missing something here ? Appreciate your help !!