NCI-CGR / plco-analysis

Primary workflow for the PLCO "Atlas" project
2 stars 3 forks source link

cleanup of incomplete pipelines #6

Open lightning-auriga opened 3 years ago

lightning-auriga commented 3 years ago

I drafted a variety of additional pipelines before we fixed the scope of the project, and those pipelines are in various states of disrepair after not being included in the primary dev process. need to flag those problematic ones and remove them.

lightning-auriga commented 3 years ago

primary target in mind is shared-makefiles/Makefile.plink which was really for early testing and nothing else. there's a world in which they'd want plink support in the future; but if they do, they should probably add a snakemake pipeline that they can develop and maintain.

lightning-auriga commented 3 years ago

some other candidates are shared-makefiles/Makefile.fastgwa and shared-makefiles/Makefile.fastgwa.grm

these are a little dubious as they certainly were in primary use for a while, but the project has since pivoted away from fastgwa to bolt-lmm. that decision was contested and somewhat arbitrary; and the pipelines were maintained longer and more completely than the plink one. will need to consider.

lightning-auriga commented 3 years ago

one thing to note is that there soon will be no maintainer for whom make is a primary language. so I feel like... I should probably be pretty aggressive in removing things that will never be properly maintained, unless they're already in an extremely strong state at the current time.

lightning-auriga commented 3 years ago

they can always pull them out of the project history if they know what they're doing.

lightning-auriga commented 3 years ago

So the straggler is fastgwa/fastgwa.grm. I'm leaving them for now, as they were recently functional and have been maintained. They need testing however, specifically after the platform independence build v2.0. So I'd flag them as suspicious and leave this issue open. The first issue, if you want to try to fix this, is to add the appropriate software into environment.yaml. Then run the grm pipeline for one-shot GRM estimation; and once that works, try out fastGWA itself. It's very, very speedy. You'll need to add - fastgwa into a continuous trait's algorithm block