broadinstitute / gnomad_local_ancestry

Hail batch pipeline and scripts for local ancestry inference
MIT License
3 stars 0 forks source link

Decide if keeping intermediate files from LAI pipeline. #106

Closed mike-w-wilson closed 2 years ago

mike-w-wilson commented 3 years ago

The pipeline writes out files for every step but not all are needed. Need to decide the fate of these. Move to nearline, cold, archive, or delete?

Eagle: Phased reference per chr Phased sample per chr

RFMix: msp file per chr fb file per chr (large and not used)

Tractor: VCF per chr (unzipped -- zip if keeping) Hap per chr (unzipped -- zip if keeping) Dos per chr (unzipped -- zip if keeping)

VCF generation: VCF with annotated call stats

mike-w-wilson commented 2 years ago

We decided to keep MSP files and I also kept some testing files. All other data was deleted with the excecption of the final VCF which now resides in the requester pays bucket