czbiohub-sf / orpheum

Orpheum (Previously called and published under sencha) is a Python package for directly translating RNA-seq reads into coding protein sequence.
MIT License
18 stars 4 forks source link

boolean requirement leads to invalid value for '--csv' and '--parquet' #100

Closed taylorreiter closed 3 years ago

taylorreiter commented 3 years ago

Snakemake workflow here: https://github.com/taylorreiter/2021-orpheum-nbhds

rule orpheum_translate_sgc_nbhds:
    input:
        ref="outputs/orpheum_index/rgnv_original_sgc_nbhds_plass_assembly_protein_ksize7.bloomfilter.nodegraph",
        fastq="outputs/rgnv_sgc_original_results/{library}_GCF_900036035.1_RGNV35913_genomic.fna.gz.cdbg_ids.reads.fa.gz"
    output:
        pep="outputs/orpheum/{library}_GCF_900036035.1_RGNV35913_genomic.fna.gz.cdbg_ids.reads.faa",
        nuc= "outputs/orpheum/{library}_GCF_900036035.1_RGNV35913_genomic.fna.gz.cdbg_ids.reads.nuc_coding.fna",
        nuc_noncoding = "outputs/orpheum/{library}_GCF_900036035.1_RGNV35913_genomic.fna.gz.cdbg_ids.reads.nuc_noncoding.fna",
        csv="outputs/orpheum/{library}_GCF_900036035.1_RGNV35913_genomic.fna.gz.cdbg_ids.reads.coding_scores.csv"
    conda: "envs/orpheum.yml"
    benchmark: "benchmarks/orpheum_translate_{library}_plass_assembly.txt"
    resources: mem_mb = 16000
    threads: 1
    shell:'''
    orpheum translate --noncoding-nucleotide-fasta {output.nuc_noncoding} --coding-nucleotide-fasta {output.nuc} --csv {output.csv} {input.ref} {input.fastq} > {output.pep}
    '''

e.g.

orpheum translate --peptides-are-bloom-filter --noncoding-nucleotide-fasta outputs/orpheum/4001_GCF_900036035.1_RGNV35913_genomic.fna.gz.cdbg_ids.reads.nuc_noncoding.fna --coding-nucleotide-fasta outputs/orpheum/4001_GCF_900036035.1_RGNV35913_genomic.fna.gz.cdbg_ids.reads.nuc_coding.fna --csv outputs/orpheum/4001_GCF_900036035.1_RGNV35913_genomic.fna.gz.cdbg_ids.reads.coding_scores.csv outputs/orpheum_index/rgnv_original_sgc_nbhds_plass_assembly_protein_ksize7.bloomfilter.nodegraph outputs/rgnv_sgc_original_results/4001_GCF_900036035.1_RGNV35913_genomic.fna.gz.cdbg_ids.reads.fa.gz > outputs/orpheum/4001_GCF_900036035.1_RGNV35913_genomic.fna.gz.cdbg_ids.reads.faa

leads to the error:

Error: Invalid value for '--csv': 'outputs/orpheum/4001_GCF_900036035.1_RGNV35913_genomic.fna.gz.cdbg_ids.reads.coding_scores.csv' is not
a valid boolean.

The help message states:

  --csv BOOLEAN                   Name of csv file to write with all sequence
                                  reads and their coding scores
  --parquet BOOLEAN               Name of parquet file to write with all
                                  sequence reads and their coding scores

@bluegenes provides a patch on branch https://github.com/czbiohub/orpheum/tree/bluegenes/patch-args