Kurt-Hetrick / CIDR_WES

CIDR's production pipeline for WES and other targeted DNA sequencing projects.
0 stars 0 forks source link

mark duplicates pixel distance setting #13

Closed Kurt-Hetrick closed 5 years ago

Kurt-Hetrick commented 6 years ago

handle dynamically given platform model for better library/optical duplicate determination

Kurt-Hetrick commented 6 years ago

Model names HiSeq-X (by gsp spec, i'm guessing that this is for both five and ten) HiSeq-3000 HiSeq-4000 (I'm going to put this in here, i don't see any conventions and the assumption is that 5000 and 6000 are pratically the same when talking about instrument models.) NovaSeq-5000 NovaSeq-6000

Kurt-Hetrick commented 5 years ago

Looks like we are going with NovaSeq however I have it in there that as long as NovaSeq is contained in the description field then assume that it is a patterned flowcell (so NovaSeq, NovaSeq-5000, NovaSeq-6000 are all the same).

anyways, this is done. There is the assumption that a sample with only have all patterned flowcells or non-patterned flowcells. If a mix then everything is treated as patterned. If that is ever done, then have to adjust workflow to do mark duplicate on each platform unit and then merge/sort afterwards.