Closed bcraig110 closed 5 years ago
Thanks for posting this. This should fix this and handle all 4 conditions.
Basically the fix is to write out the chromosome identifying the data and test if they are present or not. Will push later when not in use.
awk 'BEGIN {OFS="\t"} $2=="X"&&$3=="whole" {print "X",$6,$7} $2=="Y"&&$3=="whole" {print "Y",$6,$7}' \
$CORE_PATH/$PROJECT/REPORTS/ANEUPLOIDY_CHECK/$SM_TAG".chrom_count_report.txt" \
| paste - - \
| awk 'BEGIN {OFS="\t"} END {if ($1=="X"&&$4=="Y") print $2,$3,$5,$6 ; \
else if ($1=="X"&&$4=="") print $2,$3,"NaN","NaN" ; \
else if ($1=="Y"&&$4=="") print "NaN","NaN",$5,$6 ; \
else print "NaN","NaN","NaN","NaN"}' \
| $DATAMASH_DIR/datamash transpose \
>> $CORE_PATH/$PROJECT/TEMP/$SM_TAG".QC_REPORT_TEMP.txt"
done
FYI: I spotted a shift in the QC report when running some tests for Beth and MDL.
If ($2=="Y"&&$3=="whole") is missing completely from the ANEUPLOIDY_CHECK reports, the awk command will print the X depths correctly as numbers but will not print a value for Y depths causing a shift in the QC Report output. Changing the regex at line 128 from $1!~/[0-9]/ to $1!~/[0-9]/ || $3!~/[0-9]/ corrects the shift but prints NaN for all 4 values even though values exist for the first 2.