Kurt-Hetrick / CIDR_WES

CIDR's production pipeline for WES and other targeted DNA sequencing projects.
0 stars 0 forks source link

X.01-QC_REPORT_PREP: ANEUPLOIDY_CHECK columns shift #78

Closed bcraig110 closed 5 years ago

bcraig110 commented 5 years ago

FYI: I spotted a shift in the QC report when running some tests for Beth and MDL.

If ($2=="Y"&&$3=="whole") is missing completely from the ANEUPLOIDY_CHECK reports, the awk command will print the X depths correctly as numbers but will not print a value for Y depths causing a shift in the QC Report output. Changing the regex at line 128 from $1!~/[0-9]/ to $1!~/[0-9]/ || $3!~/[0-9]/ corrects the shift but prints NaN for all 4 values even though values exist for the first 2.

Kurt-Hetrick commented 5 years ago

Thanks for posting this. This should fix this and handle all 4 conditions.

  1. Has both X and Y
  2. Has only X
  3. Has only Y
  4. Has nothing

Basically the fix is to write out the chromosome identifying the data and test if they are present or not. Will push later when not in use.

awk 'BEGIN {OFS="\t"} $2=="X"&&$3=="whole" {print "X",$6,$7} $2=="Y"&&$3=="whole" {print "Y",$6,$7}' \
    $CORE_PATH/$PROJECT/REPORTS/ANEUPLOIDY_CHECK/$SM_TAG".chrom_count_report.txt" \
        | paste - - \
        | awk 'BEGIN {OFS="\t"} END {if ($1=="X"&&$4=="Y") print $2,$3,$5,$6 ; \
            else if ($1=="X"&&$4=="") print $2,$3,"NaN","NaN" ; \
            else if ($1=="Y"&&$4=="") print "NaN","NaN",$5,$6 ; \
            else print "NaN","NaN","NaN","NaN"}' \
        | $DATAMASH_DIR/datamash transpose \
    >> $CORE_PATH/$PROJECT/TEMP/$SM_TAG".QC_REPORT_TEMP.txt"
Kurt-Hetrick commented 5 years ago

done