NBISweden / Earth-Biogenome-Project-pilot

Assembly and Annotation workflows for analysing data in the Earth Biogenome Project pilot project.
https://www.earthbiogenome.org/
GNU General Public License v3.0
10 stars 8 forks source link

Refactor workflow #96

Closed mahesh-panchal closed 5 months ago

mahesh-panchal commented 5 months ago
mahesh-panchal commented 5 months ago

What's your opinion on this refactor? Is it more readable? Can you tell what's going on in the main.nf much better?

MartinPippel commented 5 months ago

I really like all changes!

Meaning, if I omit the decontamination step e.g. via"steps": "inspect,preprocess,assemble,purge,polish" then the following code snippet just assigns the ch_raw_assemblies channel to the ch_cleaned_assemblies channel, right?. Thats cool!!!

    // Contamination screen
    ch_to_screen = setAssemblyStage (
        ch_raw_assemblies,
        'decontaminated' // Set assembly stage now for filenaming
    ).dump(tag: 'Assemblies: to screen')
    if ( 'screen' in workflow_steps ) {
        DECONTAMINATE( ch_to_screen )
        ch_cleaned_assemblies = DECONTAMINATE.out.assemblies
    } else {
        ch_cleaned_assemblies = ch_to_screen
    }
    ch_cleaned_assemblies = ch_cleaned_assemblies.mix (
        preassembledInput( PREPARE_INPUT.out.assemblies, 'decontaminated' )
    ).dump(tag: 'Assemblies: Cleaned')

Have you tested it on a real example already? If not I can give it a try on a 50Mb genome.

mahesh-panchal commented 5 months ago

I really like all changes!

* the additional assembly auxiliary functions are pretty helpful (now that I got what they are supposed to do).

👍🏽

* If I get this if-else logic right, then this keeps the automatic flow intact, e.g. if a step is removed from the pipeline?

Exactly. I was trying to address the issue Guilherme realised.

Meaning, if I omit the decontamination step e.g. via"steps": "inspect,preprocess,assemble,purge,polish" then the following code snippet just assigns the ch_raw_assemblies channel to the ch_cleaned_assemblies channel, right?. Thats cool!!!

    // Contamination screen
    ch_to_screen = setAssemblyStage (
        ch_raw_assemblies,
        'decontaminated' // Set assembly stage now for filenaming
    ).dump(tag: 'Assemblies: to screen')
    if ( 'screen' in workflow_steps ) {
        DECONTAMINATE( ch_to_screen )
        ch_cleaned_assemblies = DECONTAMINATE.out.assemblies
    } else {
        ch_cleaned_assemblies = ch_to_screen
    }
    ch_cleaned_assemblies = ch_cleaned_assemblies.mix (
        preassembledInput( PREPARE_INPUT.out.assemblies, 'decontaminated' )
    ).dump(tag: 'Assemblies: Cleaned')

Yes, and the stage information is updated automatically too

Have you tested it on a real example already? If not I can give it a try on a 50Mb genome.

Not yet. Please do try, but you may encounter syntax errors.

MartinPippel commented 5 months ago

I got following error:

ERROR ~ No such variable: meta

 -- Check script '/home/pippel/tests/Earth-Biogenome-Project-pilot/subworkflows/local/assemble_hifi/main.nf' at line: 42 or see '.nextflow.log' file for more details

But the file subworkflows/local/assemble_hifi/main.nf was not part of this PR. Would you mind to have a look?

I potentially solved(?) this via changing this from

        fasta_ch = GFATOOLS_GFA2FA.out.fasta.multiMap{
            fasta: [ meta, fasta ]
            genome_size: meta.sample.genome_size

into:

        fasta_ch = GFATOOLS_GFA2FA.out.fasta.multiMap{ meta, fasta ->
            fasta: [ meta, fasta ]
            genome_size: meta.sample.genome_size
MartinPippel commented 5 months ago

After fixing the previous error the pipeline finished successfully (FCS was excluded for time reasons)

mahesh-panchal commented 5 months ago

But the file subworkflows/local/assemble_hifi/main.nf was not part of this PR.

You should also be able to push commits to this branch.

mahesh-panchal commented 5 months ago

There's still more to update but we can clean up later.