alesssia / YAMP

YAMP: Yet Another Metagenomic Pipeline
GNU General Public License v3.0
56 stars 28 forks source link

alphaDiversity process does not find null file and fails in AWS batch #14

Closed gilfreund closed 5 years ago

gilfreund commented 5 years ago

Hi, Running in an AWS Batch environment the fails with the message:



Caused by:
  Can't stage file file:///efs/yamp/tests/batch/null -- file does not exis                                               t

Tip: you can replicate the issue by changing to the process work dir and entering                                                the command 'bash .command.run'
 the command 'bash .command.run'```

I think that `treepath = "null"` is interpreted literally and the alphaDiversity process is executed and fails.
```Sep-12 14:39:12.546 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 16; name: logQC; status: COMPLETED; exit: 0; error: -; workDir: s3://bio/yamp/45/c6fc441f987e2884957fc3fd0f36a5]
Sep-12 14:40:02.535 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 14; name: profileTaxa (1); status: COMPLETED; exit: 0; error: -; workDir: s3://biotests/yamp/ea/00265837a7199608a1a28f87fa5311]
Sep-12 14:40:02.721 [Task monitor] DEBUG nextflow.file.FileHelper - Path matcher not defined by 'S3FileSystem' file system -- using default default strategy
Sep-12 14:40:02.721 [Actor Thread 45] DEBUG nextflow.util.CacheHelper - Unable to get file attributes file: /efs//yamp/tests/batch/null -- Cause: java.nio.file.NoSuchFileException: /efs//yamp/tests/batch/null
Sep-12 14:40:02.836 [Actor Thread 45] DEBUG nextflow.util.CacheHelper - Unable to get file attributes file: /efs//yamp/tests/batch/null -- Cause: java.nio.file.NoSuchFileException: /efs//yamp/tests/batch/null
Sep-12 14:40:02.896 [FileTransfer-thread-22] DEBUG nextflow.file.FilePorter - Copying foreign file /efs//yamp/tests/batch/null to work dir: s3://biotests/yamp/stage/95/15bf8822c48b6f717360186357c66d/null
Sep-12 14:40:02.927 [Actor Thread 45] ERROR nextflow.processor.TaskProcessor - Error executing process > 'alphaDiversity (1)'

Caused by:
  Can't stage file file:///efs//yamp/tests/batch/null -- file does not exist

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
Sep-12 14:40:02.948 [FileTransfer-thread-24] DEBUG nextflow.file.FilePorter - Copying foreign file /efs//yamp/resources/uniref90 to work dir: s3://biotests/yamp/stage/47/854cd69f4e5eca228eaedf73745945/uniref90
Sep-12 14:40:02.953 [Actor Thread 45] DEBUG nextflow.Session - Session aborted -- Cause: Can't stage file file:///efs//yamp/tests/batch/null -- file does not exist
Sep-12 14:40:02.957 [Actor Thread 45] DEBUG nextflow.Session - The following nodes are still active:
[process] alphaDiversity
  status=ACTIVE
  port 0: (queue) OPEN  ; channel: infile
  port 1: (queue) closed; channel: treepath
  port 2: (cntrl) -     ; channel: $

[process] profileFunction
  status=ACTIVE
  port 0: (queue) closed; channel: cleanreads
  port 1: (queue) OPEN  ; channel: metaphlanbuglist
  port 2: (queue) OPEN  ; channel: chocophlan
  port 3: (queue) OPEN  ; channel: uniref
  port 4: (cntrl) -     ; channel: $

[process] logCC
  status=ACTIVE
  port 0: (value) OPEN  ; channel: tolog
  port 1: (cntrl) -     ; channel: $

[process] saveCCtmpfile```

Could this be the case, or am I missing another configuration item.

``` manifest
{
  homePage = 'https://github.com/alesssia/YAMP'
  description = 'YAMP : Yet Another Metagenomic Pipeline'
  mainScript = 'YAMP.nf'
}

trace
{
    enabled = true
    fields = 'task_id, name, status, exit, module, submit, start, complete, duration, realtime, %cpu, rss, vmem, peak_rss, peak_vmem'
}

timeline
{
    enabled = true
}

params
{
        //DO NOT CHANGE

        //These are used when the analysis is in characterisation mode
        reads1 = "null"
        reads2 = "null"

        //These are used to print version and help
        help = null
        version = null

        /*--------------------------------*
         *      EXECUTION FLOW PARAMETERS
         *--------------------------------*/

        //Whether we the input reads are paired-end (two files, librarylayout="paired")
        //or single-end (one file, librarylayout="single")
        librarylayout = "paired"

        //Whether the de-duplication step should be performed
        dedup = true

        //Whether the temporary files resulting from QC steps should be kept
        keepQCtmpfile = false
        //Whether the temporary files resulting from MetaPhlAn2 and HUMAnN2 should be kept
        keepCCtmpfile = false

        /*--------------------------------*
         *      PATHS TO EXTERNAL RESOURCES
         *--------------------------------*/

        //Adapter sequences and synthetic contaminants to be removed in the trimming step
        adapters="/efs//yamp/resources/adapters.fa"
        artifacts="/efs//yamp/resources/sequencing_artifacts.fa.gz"
        phix174ill="/efs//yamp/resources/phix174_ill.ref.fa.gz"
        outdir="."

        //Reference pan-genome for contamination. It should have been indexed beforehand.
        refForeingGenome="/efs//yamp/resources/"

        //BowTie2 database for MetaPhlAn2
        mpa_pkl="/efs//yamp/resources/bowtie2db/db_v20/mpa_v20_m200.pkl"
        bowtie2db="/efs//yamp/resources/bowtie2db/db_v20/"
        bowtie2dbfiles="mpa_v20_m200"

        // ChocoPhlAn and UniRef databases
        chocophlan="/efs//yamp/resources/chocophlan"
        uniref="/efs//yamp/resources/uniref90"

        //[OPTIONAL]
        //Newick tree filepath, required for phylogenetic alpha diversity (PD_whole_tree, QIIME)
        treepath="null"
'''
Thanks
alesssia commented 5 years ago

Hi @gilfreund,

I don't think that treepath = "null" is interpreted literally. Indeed, in the code, there is an 'if' statement that, when treepath = "null", runs 'alpha_diversity.py` without the Newick tree.

Could you please share with me your command line instruction, and the .command.run file?

Many thanks, Alessia

gilfreund commented 5 years ago

From look at the nextflow log, I think now that alphaDiversity isn't even starting. nextflow.log In the log, line 382, I see that nexflow is trying to stage a file: DEBUG nextflow.file.FilePorter - Copying foreign file /efs/emendo/yamp/tests/batch/null to work dir: s3://emendobio/yamp/stage/95/15bf8822c48b6f717360186357c66d/null and then alphaDiversity fails: `Sep-13 09:54:35.287 [Actor Thread 41] ERROR nextflow.processor.TaskProcessor - Error executing process > 'alphaDiversity (1)'

Caused by: Can't stage file file:///efs/emendo/yamp/tests/batch/null -- file does not exist

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line` So I probably have a configuration error, which is preventing staging, or something is missing from a previous step. At any case, a .command.run is not even created.

alesssia commented 5 years ago

You may have hit a bug. Could you please share with me your command line instruction? Can you also send me some more information/data to replicate your issue?

gilfreund commented 5 years ago

Nextflow is version 19.07.0.5106

The command line is: nextflow run YAMP.nf --reads1 /efs/emendo/yamp/data/ERR011089_1.fastq.gz --reads2 /efs/emendo/yamp/data/ERR011089_2.fastq.gz --prefix META_ERR011089 --mode complete -bucket-dir s3://emendobio/yamp I am including YAMP.nf and nextflow.config file (in case I have made a mistake in the configurations, such as paths), as well as the Dockerfile I use (mainly to set the userid: yamp.zip I configure an AWS batch definition to handle the volumes I thought I might need (such as the scratch space, if needed) I used the files you pointed to in you documentation, just if case we have some corruption on our side. The work bucket is an empty folder in one of our buckets.

Let me know if any additional information is required.

Thanks Gil

gilfreund commented 5 years ago

I think I have found the root cause. When awsbatch is the executor, nexflow will try and stage files from S3. The S3 cp command then returns an error as there is no null file.

I used the hello world example from the nextflow site with some changes:


params.treepath="null"
hello_txt = Channel.fromPath(params.treepath)

process splitLetters {

        input:
        file(hello_file) from hello_txt

        output:
        file 'chunk_*' into letters mode flatten

        """
        cat ${hello_file} | split -b 6 - chunk_
        """
}

process convertToUpper {

    input:
    file x from letters

    output:
    stdout result

    """
                if [ $params.treepath == null ]
                then
                        echo params.treepath == null
                else
                        echo "using param.treepath $params.treepath"
                        cat $x | tr '[a-z]' '[A-Z]'

                fi
    """
}

result.println { it.trim() }

If I run it with a local executor and don't provide a treepath on the command line, it will fail on the seconds script in which I did not provide for handling on the null vaule:

N E X T F L O W  ~  version 19.07.0
Launching `./hello1.nf` [stoic_austin] - revision: 6f4900cde8
executor >  local (1)
[56/83a7bd] process > splitLetters (1) [100%] 1 of 1, failed: 1 ✘
[-        ] process > convertToUpper   -
Error executing process > 'splitLetters (1)'

Caused by:
  Missing output file(s) `chunk_*` expected by process `splitLetters (1)`

Command executed:

  cat null | split -b 6 - chunk_

Command exit status:
  0

Command output:
  (empty)

Command error:
  cat: null: No such file or directory

Work dir:
  /home/ec2-user/nextflow/work/56/83a7bd87813f3cb35d8f1ad7db10bc

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

When I do the same with awsbatch as the executor it will fail in the staging, same as the issue I encountered.

N E X T F L O W  ~  version 19.07.0
Launching './hello1.nf' [awesome_turing] - revision: 6f4900cde8
[-        ] process > splitLetters   -
[-        ] process > splitLetters   -
[-        ] process > convertToUpper -
Error executing process > 'splitLetters (1)'

Caused by:
  Can't stage file file:///home/ec2-user/nextflow/null -- file does not exist

Tip: view the complete command output by changing to the process work dir and en                                                                                        tering the command 'cat .command.out'

I am new to nextflow, so I am not sure how to handle this scenario, but I will research and update.

alesssia commented 5 years ago

I am very new to AWSbatch (I have honestly used it only once to test YAMP), so I am afraid I cannot be of much help. You could try asking in the: Nextflow Gitter, there you would find surely some help.

Thanks for keeping me posted!

gilfreund commented 5 years ago

There is a gap between local and awsbatch handling of optional file (See : https://github.com/nextflow-io/nextflow/issues/1233) I following a workaround derived from the discussion there (See: https://gitmemory.com/issue/nextflow-io/nextflow/1233/515925532) and made the following change got YAMP.nf alphaDiversity step:

process alphaDiversity {

        publishDir  workingdir, mode: 'move', pattern: "*.{tsv}"

        input:
                file(infile) from toalphadiversity
                opt_file = params.treepath
                file opt from opt_file

    output:
        file ".log.8" into log8
        file "${params.prefix}_alpha_diversity.tsv"

        when:
        params.mode == "characterisation" || params.mode == "complete"

        script:
        def treepath = opt.name != 'null' ? "--treepath $opt" : ''

I pass the params.treepath as a variable which stops the aws engine from trying to stage it. In a local run there is no staging, so there is no adverse effect. I then pass the variable to the script. Another suggestion I saw was to create an empty file called something like no_file and pointing treepath at it (See: https://github.com/nextflow-io/nextflow/issues/1233#issuecomment-513121438).

Note the aws batch did report and issue of a missing file, but it was masked from nextflow, and it completed successfully, as far as I can see.

I think a comment in the wiki may be in place, but not a code change, as I did not have a change to check this with other executors, and as nextflow will most likely address this in a later version.

alesssia commented 5 years ago

Hi @gilfreund, I have added it to the Troubleshooting.

Thanks a lot!