Public-Health-Bioinformatics / cpo-pipeline

An analysis pipeline for the purpose of investigating Carbapenemase-Producing Organisms.
MIT License
1 stars 2 forks source link

Code Cleanup #43

Open dfornika opened 5 years ago

dfornika commented 5 years ago
DiDigsDNA commented 5 years ago

Let's check that variables listed in USAGE statement match those actually parsed in job scripts.

dfornika commented 5 years ago

Good idea. I've added it to the list above.

DiDigsDNA commented 5 years ago

USAGE statement variables in bwa_mem.sh script don't match those referenced later in script. https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/bwa_mem.sh#L11

 case $key in
    -1|--R1)
    # input_R1.fastq.gz file
    input_r1_fastq="$2"
    shift # past argument
    shift # past value
    ;;
    -2|--R2)
    # input_R2.fastq.gz file
    input_r2_fastq="$2"
    shift # past argument
    shift # past value
    ;;
    -r|--reference)
    # reference fasta file
    reference="$2"
    shift # past argument
    shift # past value
    ;;
    -o|--output)
    # output sam file
    output="$2"
    shift # past argument
    shift # past value
    ;;
  esac
done

I can change the USAGE string variables.

DiDigsDNA commented 5 years ago

In this case, the USAGE statement contains the correct variables used in the script but these variables don't get initialized to "" (whereas an unused variable does). https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/fastqc.sh#L22

input_fastq="" This variable not used in script. These variables are used but aren't initialized: input_r1_fastq="$2" input_r2_fastq="$2"

I can fix this.

DiDigsDNA commented 5 years ago

USAGE statement includes all variables used in script. However, the alignment variable is initialized to "", while the model variable isn't. https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/iqtree.sh#L20

Do you want me to add model="" to line 21?

DiDigsDNA commented 5 years ago

USAGE stmt has all the variables used within the script and 3 of 4 variables used in the script are initialized. I suspect treefile="" at line 21 should be replaced with output_file=""... do you agree? https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/maskrc-svg.sh#L20

DiDigsDNA commented 5 years ago
  1. 2 of 3 variables initialized to "" except for label --> would you like me to change this? https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/mlst.sh#L22
  2. Comment should read # input contigs.fasta file https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/mlst.sh#L31
  3. It looks like input variable was changed to assembly: https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/mlst.sh#L32 https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/mlst.sh#L55 ...so I'll replace input="" with assembly="".
DiDigsDNA commented 5 years ago

I noticed 2 issues with samtools_filter_fixmate_sort.sh:

  1. Confusing comment. I don't understand the use of the term integer here. https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/samtools_filter_fixmate_sort.sh#L36
  2. Shouldn't this comment read # output sam (or bam) file ? https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/samtools_filter_fixmate_sort.sh#L42
DiDigsDNA commented 5 years ago

Inconsistency in variables initialized and those used:https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/samtools_fixmate.sh#L23 And the output variable not initialized:https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/samtools_fixmate.sh#L38 I suggest changing initial variable declarations to:

input=""
output=""

...and deleting flags=0 (because the flags variable isn't used) ...and changing https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/samtools_fixmate.sh#L37 to read `# output sam (or bam) file

DiDigsDNA commented 5 years ago

Two issues with this script:

  1. Need to add -n|--name-order to USAGE stmt.https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/samtools_sort.sh#L11
  2. Should this comment read # output sam (or bam) file ? https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/samtools_sort.sh#L43
DiDigsDNA commented 5 years ago

queries is initialized and declared but not reference otherwise in script:https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/seqtk_totalbp.sh#L24 I can delete it.

DiDigsDNA commented 5 years ago

I noticed 4 issues with shovill.sh:

  1. -q|--queries declared in USAGE stmt but not used in script --> suggest removing it https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/shovill.sh#L11
  2. -o|--output_file in in USAGE stmt doesn't match below, where -o|--output_dir is parsed https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/shovill.sh#L55
  3. -c|--mincov used in script but not declared or initialized and not mentioned in USAGE stmt. --> Should this be initialized to "" or some default value? https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/shovill.sh#L45
  4. -l|--minlen used in script but not declared or initialized and not mentioned in USAGE stmt. --> Should this be initialized to "" or some default value? https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/shovill.sh#L49
DiDigsDNA commented 5 years ago

2 issues with snippy_ctgs.sh:

  1. 2 unused variables --> suggest eliminating them https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/snippy_ctgs.sh#L21 https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/snippy_ctgs.sh#L22
  2. Suggest adding ctgs="" declaration to line 21 as this variable assigned later https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/snippy_ctgs.sh#L38 Note that snippy perl code sets default for ctgs to '' so this should work.
  3. USAGE stmt needs to instruct the user to specify a contigs file on the command line i.e. instead of USAGE="qsub $( basename "$BASH_SOURCE" ) [-h] -r|--ref REFERENCE_FASTA -c|--ctgs -o|--outdir OUTPUT_FILE use this: USAGE="qsub $( basename "$BASH_SOURCE" ) [-h] -r|--ref REFERENCE_FASTA -c|--ctgs CONTIGS_FASTA_FILE -o|--outdir OUTPUT_FILE
DiDigsDNA commented 5 years ago

Only thing I noticed for snp-dists.sh is that comment states that the alignment file is a reference fasta https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/snp-dists.sh#L29 I suggest changing this to read # alignment fasta

DiDigsDNA commented 5 years ago

Suggestion for snp-sites.sh to change comment from https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/62778005710b97c98d413331fa425f74bdc9e390/data/job_scripts/snp-sites.sh#L29 to read # alignment fasta