NBISweden / pipelines-nextflow

A set of workflows written in Nextflow for Genome Annotation.
GNU General Public License v3.0
42 stars 18 forks source link

Organelle Finder workflow #69

Closed ViktorSade closed 1 year ago

ViktorSade commented 2 years ago

Prototype organelle finder workflow

mahesh-panchal commented 2 years ago

Toy showing output only has to match pattern, and won't complain if the other is not there. main.nf:

nextflow.enable.dsl = 2

workflow {
    FOO(['mito','chlo'])
}

process FOO {
    input:
    each org

    output:
    path "*.txt"

    script:
    """
    touch file_${org}.txt
    """
}

nextflow.config:

process {
    withName: 'FOO' {
        publishDir = [
            path: './results',
            pattern: "*_{mito,chlo}.txt"
        ]
    }
}

Each task makes a different file, but the pattern matches them both.

mahesh-panchal commented 2 years ago

And {} in pattern: doesn't work, but also doesn't error. It just silently matches nothing.

mahesh-panchal commented 2 years ago

Also when your final version is ready, change the pull request to Ready for review from a Draft.

LucileSol commented 2 years ago

I ran nextflow run -profile singularity ../git/applied/pipelines-nextflow/subworkflows/OrganelleFinder/OrganelleFinder.nf -params-file 'plant_params.yml'

with this parameter file :

# General parameters
genome_assembly : 'linum_all.fna'
reference_mitochondria : 'mitochondrion.1.1.genomic.fna'
reads_file : ''
input_type : 'plant'
outdir: './results'

# Mitochondrial parameters
mit_blast_evalue : '1e-6'
mit_bitscore : 100
mit_significant_gene_matches : 2
mit_suspicious_gene_matches : 1
mit_max_contig_length : 100000
mit_min_span_fraction : 0.8

reference_chloroplast: 'plastid.1.1.genomic.fna'
chl_blast_evalue : '1e-6'
chl_bitscore : 100
chl_significant_gene_matches : 2
chl_suspicious_gene_matches : 1
chl_max_contig_length : 100000
chl_min_span_fraction : 0.8

The pipeline ran but I have no results meaning that it did not find any chloroplast or mitochondria (and I am sure it contains mitochondria and chloroplast), the statistics files are also empty.

In my genome sequence I have :

>scaffold_1--33206302_34585040+
>scaffold_2--33548729_34811913+
>scaffold_3--34545344+
>scaffold_4--34554814_34515665_34992229_35017581+
>scaffold_5--34585812+
>scaffold_6--34762752_34747320+
>scaffold_7--34823016_34429721+
>scaffold_8--34898971_34893665_35018255+
>scaffold_9--35017247+
>scaffold_10--35017267_35024205_34220946_34541214_35017265+
>scaffold_11--35027699+
>scaffold_12--35030837_34880665_35017409_35017471_34531555_34470340_34842134_35018041_35017541+
>LG10
>LG1
>LG2
>LG3
>LG4
>LG5
>LG6
>LG7
>LG8
>LG9
>3565841_3545355_3565803_3565631_3565707_3565589_3565747+,3565515_3565799_3565835_3549281_3565559_3565827_3565655_3565045_3563565_3565839+,3565815+,3565515_3565799_3565835_3549281_3565559_3565827_3565655_3565045_3563565_3565839-(circular)
>3565841_3545355_3565803_3565631_3565707_3565589_3565747+,3565515_3565799_3565835_3549281_3565559_3565827_3565655_3565045_3563565_3565839+,3565815-,3565515_3565799_3565835_3549281_3565559_3565827_3565655_3565045_3563565_3565839-(circular)

Where the first 12 contigs are mitochondria and the last two chloroplasts.

Is there some specific format that my contigs should follow? (uppercase/lowercase, number of nucleotides per line, naming ...)

nylander commented 2 years ago

Short comment on the status: current code did not run (@LucileSol ), and further testing is needed.