NBISweden / pipelines-nextflow

A set of workflows written in Nextflow for Genome Annotation.
GNU General Public License v3.0
42 stars 18 forks source link

The annotation_preprocessing new pipeline does not filter out the contigs less than 1000 nucleotides #90

Open LucileSol opened 1 year ago

LucileSol commented 1 year ago

The annotation_preprocessing new pipeline does not filter out the contigs less than 1000 nucleotides The old pipeline was doing it so we need now to do it manually if there are contigs of less than 1000 nucleotides. To be fixed eventually.

can use https://github.com/NBISweden/GAAS/blob/master/bin/gaas_fasta_purify.pl for now (I think I need to test it)

mahesh-panchal commented 1 year ago

Can you check the script written by Nextflow (.command.sh) to see if it has the --size 1000 in it?

LucileSol commented 1 year ago

yes :

#!/bin/bash -ue
gaas_fasta_purify.pl \
    --size 1000 \
    --infile genome_uppercase.fa \
    --output genome_uppercase_purified

cat <<-END_VERSIONS > versions.yml
"ANNOTATION_PREPROCESSING:ASSEMBLY_PURIFY":
    gaas: 1.2.0
END_VERSIONS
LucileSol commented 1 year ago

and gaas_fasta_purify.pl does not remove the contigs or not anymore. I tried it separately and the contigs were still there

mahesh-panchal commented 1 year ago

Then check if the --size option has changed name from a version update

Juke34 commented 1 year ago

Interesting, GAAS has the same release since 2020 (v1.2), the script should continue to work in the same way.

mahesh-panchal commented 9 months ago

Is this still an issue? Can you provide me some data I can replicate the issue with?

mahesh-panchal commented 9 months ago

The GAAS script works. The module works independently of the workflow. Testing the workflow with a sample file:

>seq1
ACGTACGTACGT
>seq2
ACGTACGT
>seq3
ACGTACGTACGT

custom.config:

process {
    withName: 'ASSEMBLY_PURIFY' {
        ext.args = '--size 10'
    }
}

command:

nextflow run main.nf -profile test,docker,gitpod --subworkflow 'annotation_preprocessing' -c custom.config --genome sample.fasta

also works successfully.

purified file:

>seq1
ACGTACGTACGT
>seq3
ACGTACGTACGT

I'm not able to replicate.