AndersenLab / alignment-nf

A nextflow pipeline for genome sequences alignment
MIT License
1 stars 0 forks source link

Add validation step #17

Closed danrlu closed 4 years ago

danrlu commented 4 years ago
  1. wi-gatk choked on a bam that was corrupted. probably due to interrupted nextflow process that was not properly resumed by nextflow. So added a validatebam step on the strain level to check output bam integrity. 66c7030b175f5648bd3585a5f47c7a8c6721ef0f and 47bfc3cfd6fe1ac0e52beccf2769e9ef3fcf7b11 We can probably make the pipeline super intelligent to detect the corrupted samples and rerun them automatically. But I'll save it for then future when we have too much free time.

  2. renamed input from "WI_sample_sheet.tsv" to "sample_sheet.tsv" to be consistent with test_data sample sheet, and with wi-gatk. fb694e1b16361e2b23c7f46d207063308245a434

  3. formatted the "strain_summary.tsv" to be almost ready for wi-gatk sample sheet. User just needs to add parent folder location (which differs on gcp or quest), and a header row (strain\tbam\tbai) to "strain_summary.tsv" in order to create "sample_sheet.tsv" for wi-gatk. e6af171d8982afe13b2ab9b7f373bf606ffe05b9 line 144

  4. removed WS245 folder since it is not used. Current genome version is WS276, and to ensure the genome fasta is the same across pipelines, it's probably safer to give the same reference location to all pipelines. 88226444e59b12b6bbaf8a2564ac3317514d0d76

The conda and container lines were commented in and out. Very sorry about this! I want to put everything in a container once I get around, but for now please bear with me....

danrlu commented 4 years ago

I'm trying to rerun all alignment tonight so I'll leave a couple of less critical changes for the future. Thanks very much for the comments!