Hi,
while trying to reproduce the published results for the Unfixed pipeline, we are coming across some parts that don't quite work when running the script Unfixed_SNP_calling.sh on individual samples.
There is this code section in the script
#filter list of highly repeated mutations with similar mutational frequency
#for those unfixed mutations that arise >=5 times in the 50K isolates, further check their reliability based on 1) the ratio in "markkept"; 2) the distribution of the mutational frequency.
cat *mixmarkkept > all_KEPT.txt; perl ~/script/loci_freq_count.pl all_KEPT.txt >kept_repeat.txt
cat *mixmark > all_MIX.txt;perl ~/script/loci_freq_count.pl all_MIX.txt > mix_repeat.txt
perl ~/script/repeat_number_merge.pl mix_repeat.txt kept_repeat.txt > merge_kept_mix.txt
perl ~/script/ratio.pl merge_kept_mix.txt > merge_kept_mix_ratio.txt
awk '$4>=5' merge_kept_mix_ratio.txt |awk '$6>0.6'|cut -f1|while read i;do echo $i > $i.per5up.txt;grep -w $i all_KEPT.txt|cut -f12 >> $i.per5up.txt;done
paste *per5up.txt > 5up_0.6_paste.txt
perl ~/script/stdv.pl 5up_0.6_paste.txt |awk '$2<0.25'|cut -f1 > 5up_0.6_0.25.list
perl ~/script/freq_extract.pl 5up_0.6_0.25.list 5up_0.6_paste.txt > 5up_0.6_0.25.txt
awk '$4>=5' merge_kept_mix_ratio.txt|cut -f1 > 5up.list
perl ~/script/repeat_loci.pl 5up_0.6_0.25.list 5up.list > 5up_remove_loc.list
and if we try to run this on a per-sample basis, the script will fail after the first awk line. In our opinion this part only makes sense if it is run on the results of the entire set of isolates, so this script has both per-sample parts and per all sample parts ?
This part
Hi, while trying to reproduce the published results for the Unfixed pipeline, we are coming across some parts that don't quite work when running the script
Unfixed_SNP_calling.sh
on individual samples. There is this code section in the scriptand if we try to run this on a per-sample basis, the script will fail after the first awk line. In our opinion this part only makes sense if it is run on the results of the entire set of isolates, so this script has both per-sample parts and per all sample parts ? This part
seems to be per individual sample again ? Would that happen to be the case or are we missing something here ? Appreciate your help !!