Gleeson-Lab / wxs_pipeline

Starting with BAMs and FASTQs, follow GATK 4.0 Best Practices up to generating a joint-genotyped VCF
1 stars 1 forks source link

FixMateInformation (Picard) #14

Closed JiaweiShen1116 closed 2 months ago

JiaweiShen1116 commented 3 years ago

FixMateInformation rule for paired-ended data to ensure that all mate-pair information in sync between each read and its corresponding mate pair.

brcopeland commented 3 years ago

I remember performing this step in the past, but is it actually still useful/required?

shishenyxx commented 3 years ago

Not sure whether the error from the blat step of MosaicHunter came from this ... In the old pipeline for WES and AmpliSeq we have this rule: rule fix_mate_info: input: bam=scratch_dir+"/{sample}/{sample}.sorted.bam", bai=scratch_dir+"/{sample}/{sample}.sorted.bai" output: bam=temp(scratch_dir+"/{sample}/{sample}.fixed.bam"), bai=temp(scratch_dir+"/{sample}/{sample}.fixed.bai") params: tmp=scratch_dir+"/{sample}/{sample}.fixmate", mem=30 priority: 4 benchmark: output_dir+"/benchmarks/fix_mate_info/{sample}.txt" log: output_dir+"/logs/fix_mate_info/{sample}.out", output_dir+"/logs/fix_mate_info/{sample}.err" shell: "{picard} -Xmx{params.mem}G FixMateInformation" " INPUT={input.bam}" " OUTPUT={output.bam}" " SORT_ORDER=coordinate" " VALIDATION_STRINGENCY=LENIENT" " CREATE_INDEX=true" " MAX_RECORDS_IN_RAM=2500000" " TMP_DIR={params.tmp};" "rm -r {params.tmp}"

brcopeland commented 3 years ago

I don't believe this is a necessary step (it's been deprecated from GATK best practices for a long time), but we can add it in potentially if there isn't some other issue with what you mention.

shishenyxx commented 3 years ago

Yeah, the best practice even doesn't recommend doing indel-realignment for GATK4 ... however ... I personally still think that it will make some difference ... especially when the follow-up analysis is based on the fixed pipeline ...