genome / analysis-workflows

Open workflow definitions for genomic analysis from MGI at WUSM.
MIT License
102 stars 57 forks source link

Remove unecessary CWL files. #119

Closed jasonwalker80 closed 3 years ago

jasonwalker80 commented 7 years ago

One example varscan/samtools_mpileup.cwl is no longer used.

chrisamiller commented 4 years ago

Doing a sweep through for unneeded/outdated tools should be part of our 2.0 release

tmooney commented 3 years ago

Here are the tools and subworkflows that are not referenced in any workflows under pipelines:

      1 vcf_eval_concordance.cwl
      1 vcf_eval_cle_gold.cwl
      1 sompy.cwl
      1 somatic_concordance_graph.cwl
      1 single_cell_rnaseq.cwl
      1 sequence_align_and_tag.updatedpicard.cwl
      1 samtools_mpileup.cwl
      1 rename.cwl
      1 pvacvector.cwl
      1 pvacfuse.cwl
      1 pvacbind.cwl
      1 position_sort.cwl
      1 pizzly.cwl
      1 molecular_qc.cwl
      1 merge_uncompressed_vcf.cwl
      1 kmer_size_from_index.cwl
      1 joint_genotype.cwl
      1 grolar.cwl
      1 gatk_genotypegvcfs.cwl
      1 filter_vcf_exac.cwl
      1 fastq_to_bqsr.cwl
      1 fastq_align_and_tag.cwl
      1 eval_vaf_report.cwl
      1 eval_cle_gold.cwl
      1 downsampled_alignment.cwl
      1 deeptools_bamcoverage.cwl
      1 cram_to_cnvkit.cwl
      1 cram_to_bam_and_index.cwl
      1 cram_to_bam.cwl
      1 combine_variants_concordance.cwl
      1 combine_gvcfs.cwl
      1 cellranger_vdj.cwl
      1 cellranger_mkfastq_and_count.cwl
      1 cellranger_mkfastq.cwl
      1 cellranger_feature_barcoding.cwl
      1 cellranger_count.cwl
      1 cellranger_atac_count.cwl
      1 cellmatch_lineage.cwl
      1 bedtools_intersect.cwl
      1 bam_to_bqsr_no_dup_marking.cwl
      1 bam_to_bqsr.cwl

Generated with this admittedly ugly one-liner (subsequently restricted to only those lines with count 1):

for single_iteration in 1; do echo ~/git/analysis-workflows/definitions/pipelines/*.cwl | xargs -n 1 /usr/local/bin/cwltool --pack | grep "id" | cut -f 4 -d '"' | cut -f 1 -d '/' | cut -f 2 -d '#' | sort | uniq | grep '.cwl' | grep -v '_2'; \ls ~/git/analysis-workflows/definitions/tools/ ~/git/analysis-workflows/definitions/subworkflows/ ~/git/analysis-workflows/definitions/pipelines/ ~/git/analysis-workflows/definitions/pipelines/ | grep '.cwl'; done | sort | uniq -c | sort -r -n
chrisamiller commented 3 years ago

To remove: 1 samtools_mpileup 1 position_sort.cwl 1 filter_vcf_exac.cwl 1 fastq_to_bqsr.cwl 1 fastq_align_and_tag.cwl 1 deeptools_bamcoverage.cwl 1 bam_to_bqsr_no_dup_marking.cwl 1 bam_to_bqsr.cwl

Notes:


cram_to_cnvkit.cwl - after cnvkit update could be removed make an no-dedup workflow for alignment - make duplication optional?

gschang commented 3 years ago

For information on pvacvector.cwl and pvacbind.cwl (above listed), When Mike and I make a manual review on neo-epitopes of cancer patients in clinical trials, we manually arranged pvacbind and pvacvector analysis to finalize vector insert design. These CWL workflows (pvacvector.cwl and pvacbind.cwl, pvacfuse.cwl) are potentially what we need to use during our clinical study for cancer vaccine design in the future.