The alignment filtering step should sort the SAM file by read id and alignment score before filtering. Right now the sorting is performed by the following command:
However, -k 12,12nr doesnt work correctly on fields like AS:i:195 and the resulting file is not correctly sorted (at least on Ubuntu 18.04).
To sort correctly, we should be using the version sort -k 12,12Vr.
However, I'm suspecting this wont work for all sort versions, and might also depend on the locale variables
The alignment filtering step should sort the SAM file by read id and alignment score before filtering. Right now the sorting is performed by the following command:
2019-07-29 13:47:23,161 - root - DEBUG - CMD: cat /media/data/test_matam/matam_dev_assembly/workdir/16sp.art_HS25_pe_100bp_50x.sortmerna_vs_SILVA_128_SSURef_NR95_b10_m10.sam | grep -v "^@" | sort -T /media/data/test_matam/matam_dev_assembly/workdir -S 10000M --parallel 6 -k 1,1V -k 12,12nr | /media/data/matam_dev/scripts/filter_score_multialign.py -t 0.9 --geometric > /media/data/test_matam/matam_dev_assembly/workdir/16sp.art_HS25_pe_100bp_50x.sortmerna_vs_SILVA_128_SSURef_NR95_b10_m10.scr_filt_geo_90pct.sam
However,
-k 12,12nr
doesnt work correctly on fields likeAS:i:195
and the resulting file is not correctly sorted (at least on Ubuntu 18.04).To sort correctly, we should be using the version sort
-k 12,12Vr
. However, I'm suspecting this wont work for allsort
versions, and might also depend on thelocale
variables