dnanexus-archive / viral-ngs

viral-ngs
6 stars 6 forks source link

move gatk & novoalign inputs from scaffold stage to refine stage #44

Open dpark01 opened 7 years ago

dpark01 commented 7 years ago

Currently, the assembly workflows ask for the gatk_tarball parameter (required) and novocraft_license parameter (optional) in the scaffolding stage. However, these tools aren't even used in the scaffolding stage. Let's move these inputs to the refine1 stage (and carry them forward to the downstream stages as currently), as that is the first place that they are needed.

mlin commented 7 years ago

As we've factored it, scaffold does call assembly.py impute_from_reference which does use novoalign. GATK could be moved

dpark01 commented 7 years ago

I'm pretty sure the only Novoalign that is happening in impute_from_reference is novoindex. I don't think novoindex benefits from having a license (or being multithreaded) and I don't know that you're even using the index anyway (don't you regenerate it in the refine steps?). I think you can just use the conda installed novoindex (the unlicensed/free/single-threaded version) for scaffolding.

@tomkinsc I'm slightly inclined to not even expose the --novoalign_license_path for impute_from_reference in our own codebase? What do you think?